Spend Less on AI Without Settling for Less

Each new AI model seemingly arrives with the kind of excitement the iPhone used to generate. The difference is the pace. These releases now come quarterly, sometimes even monthly, and from a whole field of companies rather than one.

Yesterday, Anthropic released Claude Fable 5, which is part of the “Mythos” AI model family that has gained so much attention in the news. As expected, Fable 5 sit atop nearly every AI benchmark. It also costs the most. And so it brings up the question: does your business actually need it?

Table comparing five frontier AI models on knowledge-work performance (GDPval-AA score) and on cost in dollars per million input and output tokens. Claude Fable 5 leads on performance at 1932 and is the most expensive at $10 input and $50 output, while lighter models such as Claude Sonnet 4.6 and Gemini 3.5 Flash score lower but cost far less.

The chart carries a fair amount of industry jargon, but the pattern underneath it is simple. The models that score highest on GDPval-AA, a benchmark that measures how well an AI handles real-world knowledge work, also tend to cost the most. In some ways, it is just like how a Senior Manager is more capable than an entry-level Analyst but the Senior Manager also costs more. So why ever bring on an Analyst, if the Senior Manager is the stronger hire? The answer is familiar to all those who have built a team. An Analyst is often the more cost-efficient fit for certain types of work than a highly capable, highly paid Senior Manager would be.

That is the idea behind “model routing”: send each task to the model best suited for it, rather than sending everything to the strongest model. Some work genuinely calls for the most advanced model, the same way some assignments are a poor fit for an entry-level accountant. In those instances, a lighter model like Claude Sonnet or even an open-source (free) model like Gemma would be a poor fit. Other work, like copying numbers from a PDF into a spreadsheet, would be a much better fit for a lighter model. Handing that to a costly, high-capability model like Claude Opus or Claude Fable makes as little sense as asking a Senior Manager to do the data entry.

How much does this save? There is no universally quoted study or number just yet. However, personal experience and industry anecdotes point to somewhere between 40% and 70% off AI costs.

There is a quieter benefit too. When the routing draws on several different models (e.g., Claude, ChatGPT, Gemini, Gemma, Llama, Grok, Phi, and numerous others), the combined output can produce a stronger than any single model working alone. A recent study uncover the following: “Empirically, we show that heterogeneous configurations consistently outperform homogeneous scaling: 2 diverse agents can match or exceed the performance of 16 homogeneous agents.” While that assertion is part of an emerging field of AI research, few would assert that the overarching concept “model routing” is going away anytime soon.

“Model Routing” is about being efficient. Wealth managers optimize portfolios for the most efficient way to balance risk and return. Accountants track schedules for the most efficient way to utilize billable time. CFOs allocate budgets for the most efficient way to deploy corporate capital. Shouldn’t we also consider the most efficient way to deploy AI?