Streaming Chat Architecture — BRVO Lab

Overview

Server-Sent Events streaming with dual-provider fallback. The chat widget on this site is the live implementation.

How it was built

The chat API accepts messages and streams the response back via Server-Sent Events. Primary provider is OpenRouter (for model flexibility and cost management), with a direct Anthropic SDK fallback if OpenRouter is down. The streaming parser handles both Anthropic's native event format and OpenAI's delta format — so the same client code works with any provider. First token appears in under 500ms.

SSEClaude APIOpenRouter

How it performs

Live on brvo.co.uk as both the floating chat widget (Nerve) and the embedded chat on the contact page. Handles 50+ conversations per day. Uptime: 99.8% over the last 60 days (the 0.2% was an OpenRouter outage that the Anthropic fallback caught within 2 seconds). Average first-token latency: 420ms. The dual-provider architecture means the chat has never been fully down.

How BRVO uses this

This architecture is deployed in every chatbot BRVO ships. Clients get streaming responses (feels instant), provider redundancy (never down), and cost optimisation (OpenRouter's aggregated pricing). The same codebase powers Nerve, the Ember Kitchen chatbot, and every future client chatbot.

Use cases

Customer support chat

Any business adding AI chat to their website. Streaming means the customer sees the response forming in real-time — feels conversational, not like waiting for a loading spinner.

Internal tool

A company building an internal AI assistant. The dual-provider fallback ensures the tool stays available even during API outages — critical for business-hours reliability.

Multi-tenant SaaS

A platform offering AI chat to multiple clients. The architecture supports different system prompts per client while sharing the same streaming infrastructure.

StreamingChatArchitecture