Tag

#rag

11 posts

AI & Engineering Jun 18, 2026 · 7 min

A GenAI Project That Delivered Measurable Business Impact

A GenAI project succeeds or fails on whether you can name the business metric it moved — and prove it. The architecture is the price of entry, not the win.

AI & Engineering Jun 12, 2026 · 7 min

Data Governance for GenAI in Regulated Industries

In regulated, multi-tenant GenAI, governance isn't a constraint on the build — it's what decides whether you can ship. Design it into the first diagram.

AI & Engineering Jun 9, 2026 · 6 min

Hallucination is not a model problem — it's a system design problem

Ground the model in retrieved evidence, constrain its output, verify its claims, and measure everything. A layered defense against LLM hallucination.

AI & Engineering Jun 4, 2026 · 7 min

How We Actually Measure RAG Quality

RAG quality by vibes doesn't survive a second engineer. Decompose by stage, build the eval set from real failures, calibrate the LLM judge, and gate CI.

AI & Engineering May 16, 2026 · 7 min

Multilingual RAG, Where the Languages Are Code

Code is many languages plus an identifier dialect tokenizers shred. Use code-specialized embeddings, identifier-aware retrieval, and structure-aware chunking.

AI & Engineering May 2, 2026 · 2 min

Why knowledge graphs beat vector-only RAG

Vector-only RAG returns flat, context-poor chunks ranked by similarity. Knowledge graphs model entities and relationships to traverse why things connect.

AI & Engineering Apr 24, 2026 · 6 min

Choosing a Vector Database: Benchmarks from My Own Workloads

Every vector DB benchmark was run on someone else's data. Benchmark your own vector count, dimensions, and — above all — your real filtering pattern.

AI & Engineering Apr 9, 2026 · 8 min

POC to Production: What Breaks When RAG Meets Real Users

A RAG demo proves the happy path exists. Production is everything else — tracing, drift, evals, and learning to say 'I don't have that part.'

AI & Engineering Mar 30, 2026 · 7 min

The Hallucination Problem: What Reduced Ours, What Didn't

Most LLM hallucination is a retrieval failure in disguise. Fix the context first, force citations, and give the model a sanctioned way to say 'I don't know.'

AI & Engineering Mar 18, 2026 · 7 min

Cutting LLM Inference Cost ~40% Without Losing Quality

Inference cost optimization is a measurement problem in disguise. Fix the quality metric first, then trim context, route models, and cache the stable prefix.

AI & Engineering Mar 5, 2026 · 6 min

RAG vs Fine-Tuning vs Long Context: My Decision Framework

'Should we fine-tune?' is usually the wrong first question. Ask instead: is the gap knowledge or behavior, and how often does the answer change?