Gemini 3.5 Flash benchmark

Benchmark snapshot across popular evaluation categories. Higher scores are generally better unless noted; source status is tracked separately from SEO keyword demand.

Last updated: 2026-06-26

Benchmark snapshot

Benchmark	Gemini 3.5 Flash	Gemini 3.5 Pro	Kimi K2.6	GPT-5 (API)	GPT-4o
MMLU-Pro	87.3	90.2	84.6	89.6	86.1
GPQA	71.4	76.2	68.5	76.3	68.2
HumanEval+	92.1	94.5	88.2	94.5	90.7
AIME 2024	66	72.1	61.8	72.1	63.4

Latency vs cost

Gemini 3.5 FlashGemini 3.5 ProKimi K2.6GPT-5 (API)GPT-4o

This visual is an implementation placeholder for the launch chart. V1 keeps the data table crawlable and the methodology visible.

Methodology

ModelMeter stores benchmark snapshots with source URLs and capture dates. Watchlist models are clearly labeled when official confirmation is still required. Future refresh jobs run through Cloudflare Cron and Queues.

Gemini 3.5 Flash benchmark

Benchmark snapshot

Latency vs cost

Methodology

Join the waitlist