Your LLM feature is live. Now make it actually work.

A paid 4-week live cohort for engineers who shipped a GenAI feature and own making it reliable.

From the author of the O'Reilly book on Retrieval Augmented Generation and instructor of the O'Reilly live training on LLMs in Production.

This course is for you if:

You shipped a GenAI feature in the last year and you own keeping it working
You're seeing inconsistent outputs, silent failures, or user complaints you can't reproduce
You want a systematic approach — not another list of prompting tips

What you'll get

4 weeks of live sessions with Skanda — small group, real back-and-forth, no lecture theater. Each week targets a specific layer of the reliability problem, in the order you'd actually attack it in production. Drawn from his O'Reilly courses on GenAI in Production and AI Context Engineering.

Week 1

Diagnose

Why LLM evals are fundamentally different from traditional ML — and what that means for your system
The 4 context failure modes: poisoning, distraction, confusion, clash
Building a representative eval set from real production traffic, not synthetic data
Calibrating an LLM-as-Judge so you can stop evaluating by vibes

Week 2

Context Engineering

Context as a first-class data layer — retrieve the right evidence, structure it, inject it efficiently
RAG in production: hybrid search, re-ranking, and query rewriting when basic retrieval breaks
Memory strategies: when to vectorize, summarize, or use an entity store
Keeping context fresh and relevant without ballooning token costs

Week 3

Agents

When to go agentic — and when it will make your reliability problem worse
Tool use, routing, and building loops that fail gracefully
MCP and the emerging standard for connecting agents to data and tools
Evaluating agents: testing multi-step systems without gold-standard labels

Week 4

Production

Prompt versioning and model management across iterations
Monitoring for latency, cost, and quality drift — what to instrument and what to ignore
Context pruning strategies: when to summarize, truncate, or fix the retrieval
When to fine-tune vs. prompt-engineer vs. swap the model entirely

What you'll get

Get access