Gemini 3.5 Flash: frontier performance, but the cheap tier just got expensive

Slug (auto from title, or set manually)

gemini-3-5-flash-frontier-performance-pricing

Excerpt / summary (WordPress excerpt field, ~1–2 sentences)

Google’s new Gemini 3.5 Flash beats last year’s Pro model on coding and agentic tasks and runs noticeably faster — but it costs roughly 3× the Flash tier it replaces, which changes the calculus for anyone who built on Flash because it was cheap.


Set to true if you want this pinned as the hero “Trending” card; otherwise false.

Provider (ACF: provider)

Google

Model or product (ACF: modelOrProduct)

Gemini 3.5 Flash

Release date (ACF: releaseDate)

2026-05-19


Article body (WordPress main content editor)

Google announced the Gemini 3.5 family at I/O 2026 on May 19, and shipped Gemini 3.5 Flash to general availability the same day. The headline claim is striking for a “Flash” model: Google says 3.5 Flash beats the previous-generation Gemini 3.1 Pro on nearly all of its coding, agentic, and multimodal benchmarks, while running about four times faster measured in output tokens per second.

The benchmark numbers back most of that up. On the agentic and coding suite, 3.5 Flash posts 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning — all ahead of where the prior Flash tier sat, and competitive with last year’s frontier Pro. It handles text, image, audio, and video input with text output, ships with a roughly 1M-token input context window (about 65K output), and turns “dynamic thinking” on by default.

So far this reads like the usual story: each year’s budget model quietly absorbs last year’s frontier capability. The complication is the price. Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens. The model it effectively replaces, Gemini 3 Flash, ran at $0.50 / $3.00 — a 3× increase on the tier whose entire reason for existing was to be the cheap, route-everything-through-it option.

That matters more than a sticker-price comparison suggests, because agentic models don’t just cost more per token — they emit more tokens to do the same multi-step work. Artificial Analysis reported that running their evaluation suite on 3.5 Flash cost about $1,550, versus roughly $890 on 3.1 Pro. In other words, the “fast and cheap” model was, for their agentic workload, more expensive to run than the previous Pro model.

The honest read is that 3.5 Flash is a genuinely strong model that breaks an implicit contract. Teams who built pipelines on Flash specifically because it was the throwaway-cost tier now have to re-evaluate, and “Flash” no longer reliably signals “cheap.” For new projects where you’re choosing on capability-per-dollar against current frontier models, it looks competitive. For existing high-volume Flash pipelines, the upgrade could multiply your bill — so test against your actual workload before migrating.

We’ll revisit this once independent long-context and reliability testing settles, since Google’s own numbers show 3.5 Flash trailing Pro on long-context retrieval (MRCR v2) and on knowledge-heavy evaluations like Humanity’s Last Exam.


What changed (ACF: whatChanged — one bullet per line / repeater row)

  • Gemini 3.5 Flash reached general availability on May 19, 2026, launched at Google I/O 2026.
  • Google claims it beats the previous-generation Gemini 3.1 Pro on nearly all coding, agentic, and multimodal benchmarks.
  • Reported benchmarks include 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning.
  • Runs roughly 4× faster than comparable frontier models in output tokens per second; ~1M-token input context.
  • Pricing rose to $1.50 input / $9.00 output per 1M tokens — about 3× the $0.50 / $3.00 of the Gemini 3 Flash it replaces.

Practical value (ACF: practicalValue — one bullet per line / repeater row)

  • Strong capability-per-dollar for new coding and agentic projects benchmarked against current frontier models.
  • High speed and large context make it suitable for agent loops, tool use, and multimodal analysis.
  • Good default when you want near-frontier reasoning without paying top-tier Pro prices.

Caveats (ACF: caveats — one bullet per line / repeater row)

  • The ~3× price increase breaks the “Flash = cheap” assumption; existing high-volume Flash pipelines may see bills multiply.
  • Agentic workloads emit more tokens, so real costs can exceed the per-token comparison (one suite cost ~$1,550 vs ~$890 on 3.1 Pro).
  • Google’s own figures show it trailing Pro on long-context retrieval (MRCR v2) and knowledge-heavy evals (Humanity’s Last Exam).
  • Vendor-reported benchmarks; retest against your own workload before migrating production traffic.

Suggested reading time

The frontend calculates reading minutes automatically from the body length, so you don’t need to set this.

What changed

  • Gemini 3.5 Flash reached general availability on May 19, 2026, launched at Google I/O 2026.
  • Google claims it beats the previous-generation Gemini 3.1 Pro on nearly all coding, agentic, and multimodal benchmarks.
  • Reported benchmarks include 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning.
  • Runs roughly 4× faster than comparable frontier models in output tokens per second
  • ~1M-token input context.
  • Pricing rose to $1.50 input / $9.00 output per 1M tokens — about 3× the $0.50 / $3.00 of the Gemini 3 Flash it replaces.

Practical value

  • Strong capability-per-dollar for new coding and agentic projects benchmarked against current frontier models.
  • High speed and large context make it suitable for agent loops, tool use, and multimodal analysis.
  • Good default when you want near-frontier reasoning without paying top-tier Pro prices.

Caveats

  • The ~3× price increase breaks the "Flash = cheap" assumption
  • existing high-volume Flash pipelines may see bills multiply.
  • Agentic workloads emit more tokens, so real costs can exceed the per-token comparison (one suite cost ~$1,550 vs ~$890 on 3.1 Pro).
  • Google's own figures show it trailing Pro on long-context retrieval (MRCR v2) and knowledge-heavy evals (Humanity's Last Exam).
  • Vendor-reported benchmarks
  • retest against your own workload before migrating production traffic.