AI Side Project

Mercer Street: 5 specialist agents, sub-10-second outfit recommendations, and a self-correcting LLM loop that strips hallucinations

7 min read2025AI Side Project

5-agent orchestrationSerpAPITypeScriptOpenAIHybrid retrievalSupabase

View live product

Specialist agents

<10s

End-to-end latency

Hallucinated product URLs in production

Hybrid

LLM proposes · SerpAPI grounds

Mercer Street: 5 specialist agents, sub-10-second outfit recommendations, and a self-correcting LLM loop that strips hallucinations

Context

RoleSole builder

TeamSolo — with AI-assisted development

Timeline6 weeks to first production version

StackTypeScript, OpenAI API, SerpAPI, Supabase, Deno Edge Functions

Key innovationDeterministic post-processor over LLM output; near-zero hallucination on product URLs

The problem

AI fashion recommendation is a solved research problem. Making it trustworthy in production is not.

The fashion recommendation space was crowded with LLM demos by 2025. The pattern was predictable: upload a photo, describe an occasion, get a list of "recommended items" with beautiful descriptions and purchase links that either don't exist, lead to wrong products, or show items sold out.

The interesting design question wasn't "how do we recommend better outfits?" It was "how do we architect an agent system where hallucination at the product level is structurally impossible?"

The architecture

The LLM proposes. SerpAPI grounds. A deterministic post-processor validates. Nothing else ships.

5-agent pipeline

Style Analyst

Reads user profile, occasion, existing wardrobe signals

Profile → Style brief

Outfit Architect

Designs concept: palette, silhouette, category list — no products yet

Brief → Outfit spec

Product Proposer

Generates search queries per item — intentionally not URLs

Outfit spec → Search queries

SerpAPI Grounder

Executes real product searches, returns live URLs and prices

Queries → Live products

Stylist Composer

Writes final recommendation using only verified products

Verified products → Final rec

Self-correction loop: if Grounder returns <2 valid results for a category, Proposer regenerates queries (max 2 loops).

The deterministic post-processor: Before any recommendation ships, a non-LLM validation function checks every URL in the Stylist Composer output against the SerpAPI results set. Any URL not in the verified set is stripped. The fallback of last resort.

Latency engineering

Sub-10 seconds with 5 sequential agents required careful parallelism

A naive sequential execution of 5 LLM calls would take 20–30 seconds. Total p95 latency in production: 8.4 seconds.

Agents 1–2: Sequential — each depends on prior output.

Agent 3 query generation: Parallelized per item category.

Agent 4 SerpAPI calls: All categories in parallel — most time-sensitive at ~2.5s avg.

Agent 5: Streams to UI as soon as all grounding is complete.

The lesson: Hallucination isn't a model problem — it's an architecture problem. If the LLM never has the chance to invent a URL, it can't.

All work

Open live product Next: Toquinho Estampas