Your LLM feature is live. Now make it actually work.

A paid 4-week live cohort for engineers who shipped a GenAI feature and own making it reliable.

From the author of the O'Reilly book on Retrieval Augmented Generation and instructor of the O'Reilly live training on LLMs in Production.

This course is for you if:

  • You shipped a GenAI feature in the last year and you own keeping it working
  • You're seeing inconsistent outputs, silent failures, or user complaints you can't reproduce
  • You want a systematic approach — not another list of prompting tips

What you'll get

4 weeks of live sessions with Skanda — small group, real back-and-forth, no lecture theater. Each week targets a specific layer of the reliability problem, in the order you'd actually attack it in production. Drawn from his O'Reilly courses on GenAI in Production and AI Context Engineering.

Week 1

Diagnose

  • Why LLM evals are fundamentally different from traditional ML — and what that means for your system
  • The 4 context failure modes: poisoning, distraction, confusion, clash
  • Building a representative eval set from real production traffic, not synthetic data
  • Calibrating an LLM-as-Judge so you can stop evaluating by vibes
Week 2

Context Engineering

  • Context as a first-class data layer — retrieve the right evidence, structure it, inject it efficiently
  • RAG in production: hybrid search, re-ranking, and query rewriting when basic retrieval breaks
  • Memory strategies: when to vectorize, summarize, or use an entity store
  • Keeping context fresh and relevant without ballooning token costs
Week 3

Agents

  • When to go agentic — and when it will make your reliability problem worse
  • Tool use, routing, and building loops that fail gracefully
  • MCP and the emerging standard for connecting agents to data and tools
  • Evaluating agents: testing multi-step systems without gold-standard labels
Week 4

Production

  • Prompt versioning and model management across iterations
  • Monitoring for latency, cost, and quality drift — what to instrument and what to ignore
  • Context pruning strategies: when to summarize, truncate, or fix the retrieval
  • When to fine-tune vs. prompt-engineer vs. swap the model entirely