Skip to content
arun mv
Back to blog
Tech Opinions

Making AI production-ready isn't about the model

Shipping production AI isn't about model benchmarks — it's the reliability, retries, fallbacks, observability, and cost discipline that keep LLM systems alive.

· 2 min read

Making AI production-ready isn't about the model
TL;DR the 30-second version

The hard part of shipping AI isn’t picking the model — it’s everything around it: retries, fallbacks, observability, and keeping the bill sane. The leaderboard rarely decides whether your system survives production. The system around the model does.

A lot of teams obsess over which model scores highest on a leaderboard. In production, that’s rarely the bottleneck. The bottleneck is the system around the model.

Request retries · backoff · circuit breaker LLM call rate-limit · timeout · bad JSON Deterministic fallback on failure Response Observability log prompt · model · latency · tokens · outcome — every call
The model is one box on the request path. Bounded retries, a deterministic fallback, and a circuit breaker are what keep one slow provider from taking down the whole request.

Reliability first

LLM calls fail — rate limits, timeouts, malformed JSON. Treat them like any flaky network dependency: bounded retries with backoff, a deterministic fallback, and a circuit breaker so one slow provider doesn’t take down the request path. Doing this carefully cut our error rate by about 25%.

Make it observable

You can’t fix what you can’t see. Log prompts, model, latency, token counts, and outcome for every call. When quality regresses, you want to diff prompts and inputs — not guess.

Spend like it’s your money

Model selection is a cost lever, not just a quality lever. Routing easy requests to a cheaper model and reserving the expensive one for hard cases saved us $5,000+/yr without users noticing.

The unglamorous work is the work. The model is the easy part.


Related reading