Claude Opus 4.7: Full Benchmark Scorecard and Migration Guide — The AI Crown Changes Hands Again

You saw the reel. Now here's the full story. Claude Opus 4.7 is the most capable generally available LLM on the planet — narrowly edging out GPT-5.4 and Gemini 3.1 Pro across coding, vision, and agentic benchmarks. This article gives you the complete scorecard, every breaking change, and a clear framework for deciding whether to upgrade today or wait.

Release Summary

Released: April 16, 2026

Model ID: claude-opus-4-7
Context window: 1M input tokens, 128K output tokens
Pricing: $5 per million input tokens, $25 per million output tokens (identical to Opus 4.6)
Available on: Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry
Position: Narrowly retook the lead for most powerful generally available LLM from GPT-5.4

Same price as the previous generation. That matters. Anthropic is betting on capability, not margin expansion.

The Full Benchmark Scorecard

Benchmark	Opus 4.6	Opus 4.7
CursorBench (coding)	58%	70% (+12 pts)
SWE-bench Verified	—	87.6%
Rakuten-SWE-Bench (production tasks)	1x	3x more tasks resolved
XBOW visual-acuity	54.5%	98.5%
Document reasoning errors	baseline	21% fewer
Vision max resolution	1.15 MP (1568px)	3.75 MP (2576px)

The two numbers that matter most are CursorBench and XBOW. A 12-point jump on CursorBench means the model went from "useful coding assistant" to "reliable coding agent" — that's the difference between needing to review every line and being able to trust multi-file refactors. The XBOW leap from 54.5% to 98.5% isn't incremental improvement. It's a category change. This model can now see screenshots the way a human developer does. For anyone building computer-use agents, that alone justifies the upgrade.

What's Actually New

Vision got real

Opus 4.7 is the first Claude with high-resolution image support — 2576 pixels on the long edge, over 3x prior models. Coordinates now map 1:1 to pixels, which removes the scale-factor math that made computer-use agents fragile. Screenshots, dense diagrams, and documents come through at actual fidelity.

If you've ever fed a complex dashboard screenshot to Claude and gotten back a hallucinated reading of a chart label, this is the fix. The model no longer squints at your images.

New xhigh effort level

A new tier sits between high and max. Anthropic recommends xhigh as the default for coding and agentic use. It's a Messages API-only feature. Think of it as the sweet spot: more thorough than high without the latency cost of max.

Task budgets (public beta)

Give the model an advisory token budget for an entire agentic loop — thinking, tool calls, tool results, and final output included. The model sees a running countdown and paces itself. This solves the runaway-agent problem where a loop burns through tokens on early steps and has nothing left for the final answer.

Use the beta header task-budgets-2026-03-13. Minimum budget is 20,000 tokens.

Breaking Changes You Need to Know

These will break your code if you don't address them before switching model IDs.

Extended thinking budgets are removed. Adaptive thinking is the only supported thinking mode now. If your code sets a fixed budget, it will fail.

# Old (breaks on Opus 4.7)
thinking={"type": "enabled", "budget_tokens": 8000}

# New
thinking={"type": "adaptive"}
# Pair with:
output_config={"effort": "high"}

Temperature, top_p, top_k return 400 errors if set to non-default values. If you've been passing temperature=0 for "determinism," remove it. It never guaranteed deterministic output anyway — it just narrowed the distribution.

Thinking content is omitted from responses by default. If your application displays or logs the model's reasoning, add "display": "summarized" to the thinking config to restore it.

New tokenizer uses 1.0x to 1.35x more tokens for the same content. If your max_tokens values are tight, increase them by at least 35% or you'll get truncated responses.

Adaptive thinking is off by default. You must explicitly enable it with thinking: {"type": "adaptive"}. If you assume the model thinks deeply out of the box, it won't.

Behavior Changes

These won't throw errors, but your prompts may produce different output.

More literal instruction following. Opus 4.7 does what you say, not what it thinks you meant. Vague prompts get worse results. Precise prompts get dramatically better results.

Response length calibrates to task complexity. Short questions get short answers. The model is less verbose by default.
Fewer tool calls by default. If your agent relies on the model proactively using tools, raise the effort level.
More direct, opinionated tone. Less warm-and-fuzzy than 4.6. If your use case needs diplomatic language, specify it in your system prompt.
More regular progress updates during long agentic traces.
Fewer subagents spawned by default. The model is more conservative about parallelizing work unless you ask for it.

The theme: Opus 4.7 is more disciplined. It follows instructions tighter, talks less, and acts more deliberately. If your prompts were already precise, you'll see better output. If they were loose, you'll need to tighten them.

Should You Upgrade?

Upgrade immediately if:

You build agents that write code or use a computer

You process documents, charts, screenshots, or diagrams
You run long agentic loops where token budget matters
You were hitting instruction-following issues with 4.6

Wait and test if:

Your production prompts depend on the old tokenizer's token counts

You rely on explicit thinking budgets (budget_tokens)
You set temperature for "determinism" (remove it — but test first)

For most developers, the upgrade path is straightforward: swap the model ID, remove temperature settings, switch to adaptive thinking, and bump your max_tokens headroom. A 30-minute migration for a meaningful capability jump.

The Mythos Connection

Opus 4.7 is the first Claude shipping with automated detection and blocking for prohibited cybersecurity uses — directly inherited from Mythos Preview and Project Glasswing. The model actively refuses to assist with vulnerability exploitation, malware development, and attack tooling outside of authorized contexts.

If you build legitimate security tooling — penetration testing, red teaming, vulnerability scanning — and you're hitting refusals, apply to Anthropic's Cyber Verification Program for whitelisted access.

The Bottom Line

The upgrade is a no-brainer for agentic coding and anything involving vision. Watch your token usage on the new tokenizer — that 1.35x multiplier adds up on high-volume workloads. Follow @easyai.ai for the next breakdown.

---

Sources

Anthropic announcement

---

Want more?

Browse our prompt packs, guides, and automation tools.

Browse products →