Multi-Model Routing — BRVO Lab

Overview

Intelligent routing layer that picks the right model for each task — optimising cost and latency per request.

How it was built

A middleware layer that sits between the application and the AI providers. When a request comes in, the router classifies the task type (reasoning, vision, simple extraction, creative writing) and routes to the optimal model. Claude Sonnet for complex reasoning, Haiku for simple classification, GPT-4o for image analysis. The router itself uses Haiku to classify — adding only 50ms overhead.

OpenRouterClaude APIGPT-4o

What's being tested

Testing across 500 real requests from the BRVO site audit tool. Current results: routing reduces average cost per request by 41% compared to sending everything to Sonnet, with only a 3% drop in output quality (measured by human evaluation). The biggest win is on simple tasks — extracting meta tags from HTML costs £0.001 with Haiku vs £0.008 with Sonnet, same accuracy.

How BRVO uses this

Being integrated into all BRVO AI features to reduce client operating costs. When a chatbot answers a simple FAQ, it uses Haiku. When it needs to reason about a complex customer problem, it routes to Sonnet. Clients get better AI at lower monthly cost.

Use cases

High-traffic chatbot

A chatbot handling 10,000 messages per day. Simple greetings and FAQs go to Haiku (£0.001/msg). Complex product comparisons go to Sonnet (£0.008/msg). Monthly cost drops from £2,400 to £1,400 with no quality loss on complex queries.

Document processing pipeline

An invoicing system processing 500 PDFs daily. Simple field extraction (date, amount, vendor) uses Haiku. Anomaly detection and fraud flagging routes to Sonnet. 60% cost reduction.

Content moderation

A platform moderating user-generated content. Obviously safe content passes through Haiku instantly. Edge cases route to Sonnet for nuanced judgement. Faster moderation, lower cost, same safety.

Multi-ModelRouting