Today we’re announcing a partnership with LLMGateway. Every Clawship instance set to Platform mode now routes prompts through their gateway — and most teams will see their model bill drop by 40% to 60% without changing a single workflow.
Here’s the long version: why a single-model BYOK setup quietly overspends, what a gateway actually does, and the math we ran before flipping the default for shared instances.
The BYOK trap nobody talks about
When you bring your own API key, you typically pin one model. It’s the path of least resistance — pick Claude Opus 4.6, paste the key, ship. The problem isn’t Opus. The problem is using Opus to answer “what are your business hours?”
Look at any production assistant’s logs and you’ll find the same shape:
- ~60% of messages are trivial. Greetings, repeat questions, tiny clarifications. A small model answers them perfectly.
- ~30% are medium. RAG lookups, structured-data fills, short summaries. A mid-tier model is overkill-but-safe.
- ~10% are hard. Multi-step reasoning, code review, ambiguous intent. This is the only slice that genuinely needs a flagship.
Pinning Opus means the trivial 60% costs the same per token as the hard 10%. You’re paying for reasoning capacity you didn’t use. Multiply by ten thousand conversations a month and the bill doesn’t look small anymore.
What an LLM gateway actually does
Per-request routing
Every prompt is classified by complexity and routed to the cheapest model that can answer it correctly. Trivial → Haiku-class. Medium → Sonnet-class. Hard → flagship.
Cross-provider failover
If Anthropic is rate-limiting or down, the gateway falls back to OpenAI, Google or DeepSeek without your code knowing. Your assistant stays online.
Unified billing
One credit balance instead of five provider invoices. Every conversation has a per-token cost line you can audit.
Caching & batching
Identical prompts hit a semantic cache. Embedding calls are batched. Both cuts compound on top of routing.
The math, with real numbers
Take a moderate workload: 10,000 conversations a month, average 6 turns per conversation, average 800 input tokens and 200 output tokens per turn. That’s 60k LLM calls.
Apply the 60 / 30 / 10 traffic split. Use rough November 2025 list prices (input / output per million tokens):
| Tier | Model class | Input $/1M | Output $/1M | Calls |
|---|---|---|---|---|
| Trivial | Haiku-class | $0.80 | $4.00 | 36,000 |
| Medium | Sonnet-class | $3.00 | $15.00 | 18,000 |
| Hard | Opus-class | $15.00 | $75.00 | 6,000 |
BYOK on Opus, every call
60,000 calls × (800 × $15 + 200 × $75) / 1,000,000 = ~$1,620 / month.
Platform Gateway, smart routing
- Trivial: 36,000 × (800 × $0.80 + 200 × $4.00) / 1M = ~$52
- Medium: 18,000 × (800 × $3.00 + 200 × $15.00) / 1M = ~$97
- Hard: 6,000 × (800 × $15.00 + 200 × $75.00) / 1M = ~$162
Total: ~$311 / month for the same workload — about 81% less than pinning Opus. Even with our ~3% margin on top, the bill lands near $320.
Your numbers will vary. We’ve seen savings between 40% and 75% depending on prompt complexity, RAG context size, and how chatty the use case is. Customer support skews toward the high end. Code review skews toward the low end.
How Platform mode works inside Clawship
You’ll find an LLM Mode toggle on every instance:
- Platform — your prompts go through LLMGateway. We bill you against your Clawship credit balance at provider cost plus a thin margin (~3%) that covers Stripe fees, gateway routing, and infra. Default for shared instances on every paid plan.
- BYOK — your prompts go straight to your Anthropic, OpenAI, Google, xAI or DeepSeek key. Provider bills you directly. Default when you bring your own key on the free Sandbox or Community plan.
Switch is one click and zero downtime. Your conversation history, knowledge base, and workflows don’t care which mode they’re running under.
When BYOK is still the right call
Existing provider credits
If you already have a six-figure Anthropic or OpenAI commit, BYOK consumes that contract instead of adding a second invoice.
Compliance constraints
Some teams need every request to land at a specific provider in a specific region. BYOK gives you that pin.
Single-model evaluation
If you're benchmarking one specific model's behaviour, you don't want a router silently swapping it out.
For everything else — and especially for high-volume conversational traffic — Platform mode is the cheaper, more resilient default.
Why we picked LLMGateway
We evaluated four gateways before signing. LLMGateway won on three things that matter for a multi-tenant platform:
- Honest pricing. Provider cost passthrough plus a transparent margin. No bundle pricing that hides where the money goes.
- Real failover. Cross-provider, not just cross-region. If Anthropic rate-limits, requests land on OpenAI without us writing retry logic.
- Per-tenant attribution. Their API gives us per-instance token usage so your dashboard can show exactly what each assistant costs.
Try it
Platform mode is live for everyone on Starter, Pro, and Business right now. Open an existing instance and flip the LLM Mode toggle, or spin up a new one.
Questions about your specific workload? Email [email protected] with rough traffic numbers and we’ll send back a real cost projection for both modes.