Tag

#evaluation

5 posts

AI & Engineering Jun 9, 2026 · 6 min

Hallucination is not a model problem — it's a system design problem

Ground the model in retrieved evidence, constrain its output, verify its claims, and measure everything. A layered defense against LLM hallucination.

AI & Engineering Jun 4, 2026 · 7 min

How We Actually Measure RAG Quality

RAG quality by vibes doesn't survive a second engineer. Decompose by stage, build the eval set from real failures, calibrate the LLM judge, and gate CI.

AI & Engineering Apr 9, 2026 · 8 min

POC to Production: What Breaks When RAG Meets Real Users

A RAG demo proves the happy path exists. Production is everything else — tracing, drift, evals, and learning to say 'I don't have that part.'

AI & Engineering Mar 30, 2026 · 7 min

The Hallucination Problem: What Reduced Ours, What Didn't

Most LLM hallucination is a retrieval failure in disguise. Fix the context first, force citations, and give the model a sanctioned way to say 'I don't know.'

AI & Engineering Mar 18, 2026 · 7 min

Cutting LLM Inference Cost ~40% Without Losing Quality

Inference cost optimization is a measurement problem in disguise. Fix the quality metric first, then trim context, route models, and cache the stable prefix.