AI Side Project

Mercer Street: 5 specialist agents, sub-10-second outfit recommendations, and a self-correcting LLM loop that strips hallucinations

7 min read2025AI Side Project
5-agent orchestrationSerpAPITypeScriptOpenAIHybrid retrievalSupabase
5
Specialist agents
<10s
End-to-end latency
0
Hallucinated product URLs in production
Hybrid
LLM proposes · SerpAPI grounds
Mercer Street: 5 specialist agents, sub-10-second outfit recommendations, and a self-correcting LLM loop that strips hallucinations
Context
RoleSole builder
TeamSolo — with AI-assisted development
Timeline6 weeks to first production version
StackTypeScript, OpenAI API, SerpAPI, Supabase, Deno Edge Functions
Key innovationDeterministic post-processor over LLM output; near-zero hallucination on product URLs
The problem

AI fashion recommendation is a solved research problem. Making it trustworthy in production is not.

The fashion recommendation space was crowded with LLM demos by 2025. The pattern was predictable: upload a photo, describe an occasion, get a list of "recommended items" with beautiful descriptions and purchase links that either don't exist, lead to wrong products, or show items sold out.

The interesting design question wasn't "how do we recommend better outfits?" It was "how do we architect an agent system where hallucination at the product level is structurally impossible?"

The architecture

The LLM proposes. SerpAPI grounds. A deterministic post-processor validates. Nothing else ships.

5-agent pipeline
1
Style Analyst
Reads user profile, occasion, existing wardrobe signals
2
Outfit Architect
Designs concept: palette, silhouette, category list — no products yet
3
Product Proposer
Generates search queries per item — intentionally not URLs
4
SerpAPI Grounder
Executes real product searches, returns live URLs and prices
5
Stylist Composer
Writes final recommendation using only verified products
Self-correction loop: if Grounder returns <2 valid results for a category, Proposer regenerates queries (max 2 loops).
The deterministic post-processor: Before any recommendation ships, a non-LLM validation function checks every URL in the Stylist Composer output against the SerpAPI results set. Any URL not in the verified set is stripped. The fallback of last resort.
Latency engineering

Sub-10 seconds with 5 sequential agents required careful parallelism

A naive sequential execution of 5 LLM calls would take 20–30 seconds. Total p95 latency in production: 8.4 seconds.

Agents 1–2: Sequential — each depends on prior output.
Agent 3 query generation: Parallelized per item category.
Agent 4 SerpAPI calls: All categories in parallel — most time-sensitive at ~2.5s avg.
Agent 5: Streams to UI as soon as all grounding is complete.
The lesson: Hallucination isn't a model problem — it's an architecture problem. If the LLM never has the chance to invent a URL, it can't.
All work