Exercise 3: LLM as a Judge System

Build an AI quality control system where one LLM generates content and another LLM evaluates it, providing feedback and iterating until quality standards are met.

Exercise 3 Overview

60-Minute Hands-On Exercise

This exercise teaches you loops and quality control - building AI systems that check their own work and improve automatically. This is how you get consistent, reliable AI output without manual review.

The Problem: AI-generated content is inconsistent—the same prompt can produce brilliant results or complete rubbish. Manual review doesn’t scale, and there’s no built-in quality control. You need an automated system where one AI generates content and another AI judges it, providing feedback until quality standards are met.

Why Loops + Quality Control Matter:

You’ll learn to make AI reliable by having it check and improve its own work automatically:

✅ Consistent quality: Only get output that meets your standards
✅ No manual review: AI judges AI, you set the criteria
✅ Self-improving: Failed attempts get feedback and try again
✅ Scalable quality: Review 1000 pieces as easily as 1
✅ Peace of mind: Nothing leaves the system unless it passes

This is how professional AI systems ensure quality at scale.

What You’ll Build

A complete AI quality control workflow featuring:

Web form interface for task submission with custom quality criteria
Generator AI that creates content based on instructions
Judge AI that evaluates output against your specific criteria
Feedback loop with iterative improvement (up to 10 attempts)
Structured evaluation with pass/fail status and actionable feedback
Loop control logic preventing infinite iterations

graph LR
    A[Form Input] --> B[Generate Content]
    B --> C[Evaluate Quality]
    C --> D{Pass?}
    D -->|No & < 10| E[Incorporate Feedback]
    E --> B
    D -->|Yes or Max| F[Return Result]

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#fff3e0
    style D fill:#f3e5f5
    style F fill:#e8f5e9

Skills You’ll Demonstrate

By completing this exercise, you will have demonstrated:

Loop Logic Design: Building iterative workflows with exit conditions
Dual AI Agent Setup: Configuring two AI models for different roles (creation vs evaluation)
Structured Output Parsing: Extracting consistent JSON from AI responses
Quality Gate Implementation: Creating pass/fail criteria and feedback mechanisms
Variable Management: Tracking state across loop iterations (counters, previous feedback)

The Pattern You’ll Master

The AI Quality Control & Iteration pattern you’ll learn applies to:

Content Marketing: Generate brand-aligned copy that consistently meets tone and style guidelines
Code Generation: Create code that passes security, performance, and style standards
Customer Service: Produce empathetic, accurate responses before sending
Product Descriptions: Generate compelling copy that includes all required information
Email Campaigns: Create messages that pass compliance and brand guidelines
Educational Content: Produce learning materials that meet pedagogical standards

Exercise Structure

⚙️ Part A: Setup

Google Gemini API + tracking spreadsheet

Start here →

🔧 Part B: Build & Test

Generator, judge, loop with exit conditions

Build →

🏆 Challenges

Multi-criteria, A/B testing, optimisation

Level up →

Section	What You’ll Do	Duration
Part A: Setup	Configure credentials, create results tracking sheet	15 min
Part B: Build & Test	Build complete workflow with iterative quality control	40 min
Challenges	Extend with advanced evaluation and testing patterns	Optional