Tuesday, September 16, 2025
Plugin Install : Cart Icon need WooCommerce plugin to be installed.

AI Evals For Engineers & PMs

★ 4.8 (345) • 4 weeks • Cohort-based course

Learn proven approaches for quickly improving AI applications. Build AI that works better than the competition, regardless of the use-case.

✔ This course is popular — 50+ people enrolled last week.
Hamel Husain and Shreya Shankar
ML Engineers with 25+ years combined building & evaluating AI systems

Course Overview

Eliminate the guesswork of building AI applications with data-driven approaches.

⚠ Oct 6 is the last cohort we are teaching this year. Seats are limited.

AI Evals For Engineers & PMs

Proven processes from 25+ AI implementations
★★★★★ (345 reviews)
$2,300 USD
Next Cohort: Oct 6 – Nov 1, 2025
3 days left to enroll
Enroll
Get reimbursed • Save 20% with a team
Next cohort: Oct 7, 2025 • Limited seats
New cohort • Fall 2025

Ship Confidently with AI Evaluations

A hands-on cohort to build rubrics, golden sets, and online experiments so your AI features deliver consistent, measurable quality.

4–6 hrs/week Oct 7 – Nov 8 Live + community Remote, global

Instructor

Instructor Name
Hamel Husain and Shreya Shankar
Ex-FAANG • Built eval platforms for products serving 50M+ users
★★★★★ (4.9 avg • 1,200+ reviews)
Your Name
Ex-FAANG • Built eval platforms for products serving 50M+ users
★★★★★ (4.9 avg • 1,200+ reviews)
1,240+
Alumni
4.9/5
Average Rating
32
Countries
6
Live Sessions
🚨 Oct 6 is the last cohort we are teaching this year. Seats are limited. 🚨

Enrollment Notes

  • All students get LIFETIME ACCESS to all materials and recordings.
  • We hold 9+ hours of office hours to maximize the value of live interaction.
  • October 6th Cohort Only: 10 months of unlimited access to our new AI Eval Assistant (details near the bottom of this page).
  • Students get lifetime access to a Discord community with 1k+ students and instructors.
  • This is a flipped classroom. Lectures are professionally recorded; live time focuses on office hours and interaction.

Do you catch yourself asking…

  1. How do I test applications when outputs are stochastic and require subjective judgements?
  2. If I change the prompt, how do I know I’m not breaking something else?
  3. Where should I focus my engineering efforts—do I need to test everything?
  4. What if I have no data or customers—where do I start?
  5. What metrics should I track? What tools should I use? Which models are best?
  6. Can I automate testing and evaluation—and if so, how do I trust it?

If you aren’t sure about the answers, this course is for you.

Hands-on for engineers & technical PMs. Great for coders and “vibe coders.”

What to Expect

Hands-on exercises with code & data. We meet twice a week for four weeks with generous office hours and an active Discord community.

You’ll build skills that set you apart (see testimonials). All sessions are recorded for async viewing.

Course Content

Lesson 1: Fundamentals & Lifecycle LLM Application Evaluation

  • Why evaluation matters—business impact & risk mitigation
  • Challenges unique to LLM outputs
  • Lifecycle from development → production
  • Basic instrumentation & observability
  • Intro to error analysis & failure categorization

Lesson 2: Systematic Error Analysis

  • Synthetic data generation
  • Annotation strategies & quant analysis
  • Translating errors into improvements
  • Avoiding common pitfalls
  • Exercise: Build an error tracking system

Lesson 3: Implementing Effective Evaluations

  • Metrics: code-based & LLM-judge
  • Per-output & system-level evaluation
  • Dataset structure for inputs & references
  • Exercise: Automated evaluation pipeline

Lesson 4: Collaborative Evaluation Practices

  • Team workflows for evaluation
  • Inter-annotator agreement
  • Consensus on criteria
  • Exercise: Alignment in breakout groups

Lesson 5: Architecture-Specific Strategies

  • RAG: retrieval relevance & factual accuracy
  • Multi-step pipelines & error propagation
  • Tool use & multi-turn conversations
  • Multi-modal evals (text/image/audio)
  • Exercise: Targeted test suites

Lesson 6: Production Monitoring & Continuous Evaluation

  • Traces, spans, sessions
  • CI/CD quality gates
  • Comparable experiments
  • Safety & quality guardrails
  • Exercise: Monitoring dashboard

Lesson 7: Continuous Human Review Systems

  • Strategic sampling
  • Reviewer UX for productivity
  • Exercise: Continuous feedback collection

Lesson 8: Cost Optimization

  • Value vs spend for LLM apps
  • Intelligent model routing
  • Exercise: Real-world cost optimization

What you'll be able to do

This cohort focuses on shipping. Each module ends with templates and a checklist you can adapt to your stack.

Design robust evaluation suites for AI features
Ship quality gates that reduce regressions by 40%+
Operationalize offline + online evals with minimal infra
Align model metrics with business outcomes

Next cohort details

Start date
Mon, Oct 7, 2025
Duration
5 weeks • 2 live sessions/week
Cohort size
~60 learners
Format
Remote • Zoom + Slack
Apply Now

Week 1

Foundations of AI Evals
  • Offline, online, human-in-the-loop
  • Representative datasets
  • Golden sets & rubrics

Week 2

Designing Rubrics & Quality Bars
  • From UX goals → measurable criteria
  • LLM-as-a-judge pitfalls
  • Inter-rater agreement & drift

Week 3

Automation & Tooling
  • Test runners & CI hooks
  • Prompt/Model versioning
  • Budgeting & observability

Week 4

Online Evals & Experimentation
  • A/B, interleaving & bandits
  • Guardrails & real-time feedback
  • Rollout playbooks

Capstone

Shipping Your Eval Suite
  • KPIs & data contracts
  • CI quality gates
  • Executive-ready readout

Build once, reuse everywhere

Your capstone delivers a working eval suite for a real feature in your product. We provide reference data contracts, CI examples, and an executive-ready readout.

Example artifacts from past capstones

What learners say

★★★★★

“We replaced ad-hoc checks with a lightweight eval pipeline and cut incident rates by 52%. The capstone became our internal standard.”

Priya Sharma
PM, Fintech
★★★★★

“The rubric workshop finally aligned UX, product, and research. Our win-rate in experiments jumped immediately.”

Alex Chen
Sr. MLE, SaaS
★★★★★

“Clear, practical, and opinionated. The templates saved weeks of trial-and-error.”

Marcos Pérez
Head of Product, Healthtech

Tuition & Scholarships

We reserve seats for builders from underrepresented backgrounds and offer team pricing for groups of 3+.

Employer reimbursement guide
VAT/GST compliant invoices
Need-based scholarships available
Professional Track
$2,300
USD
  • 10+ hours live instruction
  • Capstone project review
  • Private community
  • Templates & checklists
Apply & Reserve Seat
Team pricing available

Get the syllabus

Enter your work email and we’ll send the PDF.

We’ll never spam. Unsubscribe anytime.

Talk to the team

FAQs

Who is this course for?
Product engineers, ML engineers, applied scientists, and PMs shipping AI features who need a pragmatic evaluation strategy.
What is the time commitment?
Expect ~4–6 hours/week: 2 live sessions, assignments, and async peer feedback.
Will sessions be recorded?
Yes. Recordings and slides are shared after each session along with code templates.
Do I need prior ML experience?
Basic familiarity with prompts/APIs is helpful. We provide templates for metrics, rubrics, and CI integration.

Ready to raise the quality bar?

Join a focused cohort of builders and leave with a reusable evaluation foundation you can ship immediately.

Welcome Back!

Login to your account below

Create New Account!

Fill the forms bellow to register

Retrieve your password

Please enter your username or email address to reset your password.