RAG chatbot cost calculator
Estimate retrieval-heavy chatbot costs across prompt context, generated answers, cache hit rate, and model choice.
Last updated:
Scenario calculator
Calculate and compare
Retrieval-heavy chatbot with cacheable context.
Gemini 3.5 Flash
Based on 12 months
Per active user / month
Per modeled task
$45.04 / month
$45.04 / month
Gemini 3.5 Flash is watchlist; verify sources before procurement. Cached input pricing is not confirmed for this model and is excluded from the estimate. Llama 3.1 70B Instruct is aggregator verified; verify sources before procurement.
| Model | Provider | Source status | Input / 1M | Output / 1M | Monthly cost | Cost / task |
|---|---|---|---|---|---|---|
| Gemini 3.5 Flash | Watchlist | $0.08 | $0.30 | $45.04 | $0.0005 | |
| Llama 3.1 70B Instruct | Meta / Hosted APIs | Aggregator verified | $0.59 | $0.79 | $237.63 | $0.0024 |
| GPT-4o | OpenAI | Official stale | $2.5 | $10 | $1,550.15 | $0.0155 |
| Claude 3.5 Sonnet | Anthropic | Official stale | $3 | $15 | $1,967.09 | $0.0197 |
Cache impact: 20% ยท selected models: 4
Hidden cost checklist
- Retrieved context tokens
- Embedding and reranking costs
- Cache misses
- Quality review loops
Related planning pages
Compare this scenario against the main AI model cost calculator, provider pages, and the pricing change log before locking a production budget.
RAG chatbot cost calculator FAQ
What assumptions does the RAG chatbot cost calculator use?
The default preset starts with 100,000 monthly requests, 3,800 input tokens, and 650 output tokens per request.
Why does source status matter?
AI model prices change quickly. ModelMeter keeps watchlist, stale, aggregator, and official records visibly separate so estimates do not look more certain than the sources allow.