RAG chatbot cost calculator

Estimate retrieval-heavy chatbot costs across prompt context, generated answers, cache hit rate, and model choice.

Last updated: 2026-06-26

Scenario calculator

Calculate and compare

Retrieval-heavy chatbot with cacheable context.

Select models (4 selected)

Monthly requests

requests / month

Input tokens

tokens / request

Output tokens

tokens / request

Estimated monthly cost$45.04

Gemini 3.5 Flash

Annual budget$540.44

Based on 12 months

Cost per user$0.0038

Per active user / month

Cost per task$0.0005

Per modeled task

Cheapest alternativeGemini 3.5 Flash

$45.04 / month

Best value candidateGemini 3.5 Flash

$45.04 / month

Data last verified Jun 26, 2026 Source status visible on every result Watchlist prices are not procurement advice

Gemini 3.5 Flash is watchlist; verify sources before procurement. Cached input pricing is not confirmed for this model and is excluded from the estimate. Llama 3.1 70B Instruct is aggregator verified; verify sources before procurement.

Model cost comparison for this scenario
Model	Provider	Source status	Input / 1M	Output / 1M	Monthly cost	Cost / task
Gemini 3.5 Flash	Google	Watchlist	$0.08	$0.30	$45.04	$0.0005
Llama 3.1 70B Instruct	Meta / Hosted APIs	Aggregator verified	$0.59	$0.79	$237.63	$0.0024
GPT-4o	OpenAI	Official stale	$2.5	$10	$1,550.15	$0.0155
Claude 3.5 Sonnet	Anthropic	Official stale	$3	$15	$1,967.09	$0.0197

Cache impact: 20% · selected models: 4

Hidden cost checklist

Retrieved context tokens
Embedding and reranking costs
Cache misses
Quality review loops

Related planning pages

Compare this scenario against the main AI model cost calculator, provider pages, and the pricing change log before locking a production budget.