6+ years shipping LLM, RAG and agent systems on a React/Next.js, Node/NestJS, Python and AWS stack. I lead the engineering team at India's largest dental e-commerce platform — where I've proven these systems at real scale. The same patterns power any high-traffic product: SaaS, fintech, marketplaces and healthtech.
These come from running India's largest dental e-commerce platform — but the walls are universal to any fast-growing product: support can't keep up, pages get slow, search gets expensive, traffic spikes break things, and the cloud bill spirals. Here's how I solved each at scale.
"Where's my order?", returns, refunds, cancellations — thousands of repetitive tickets per day, slow responses, rising support headcount.
A heavy client-rendered React site meant poor Core Web Vitals, weak Google rankings, and a costly third-party prerender service just to be crawlable.
A $2,300/month hosted search bill, limited relevance control, and customers who couldn't find products by image or voice.
The Magento catalogue service was slow to extend, expensive to run, and a bottleneck for new features.
Storage egress, an external ETL pipeline, and over-provisioned compute were quietly inflating monthly AWS spend.
E-commerce traffic isn't steady — sales, festival rushes and marketing pushes create sudden floods of concurrent requests. I architected the platform to absorb those spikes without falling over.
The AI assistant alone fields 5,000+ live queries a day against real order data — concurrent reads/writes across MySQL, MongoDB and microservice APIs — while the storefront serves peak shopping traffic. The result: stable response times and no degradation when load surges.
The roadmap I'm following to stay at the front of production AI engineering.
Multi-agent orchestration (LangGraph, CrewAI), evaluation & guardrails, prompt/version management, and observability for LLM apps (LangSmith, tracing, cost/latency monitoring).
Kubernetes (EKS) for orchestration beyond ECS, load/stress testing (k6), and resilience patterns — circuit breakers, rate limiting, and graceful degradation under burst traffic.
AWS Certified Solutions Architect, deeper vector-DB tuning (pgvector, Pinecone, Qdrant), fine-tuning vs. RAG trade-offs, and streaming data (Kafka) for real-time personalization.
I take ideas from architecture to production — for SaaS, fintech, marketplaces, healthtech and e-commerce. Available for senior full-time roles and select freelance/contract engagements.
New Delhi, India · Open to Remote