DeepSeek V4: $5.6M Training Cost, 80.6% on SWE-bench, 7x Cheaper Than Claude Opus
By easyAI Team · 11 min read · 2026-04-24
DeepSeek trained a model for $5.6 million that scores within 0.2 percentage points of Claude Opus 4.6 on SWE-bench Verified. The output price is $3.48 per million tokens. Claude Opus charges $25. The model weights are open for anyone to download.
That combination of facts, all at once, is why this matters. Not because one model is "better" than another, but because the cost assumptions behind frontier AI just got challenged in a way that's hard to ignore.
What Is DeepSeek V4?
DeepSeek V4 is a preview release from DeepSeek, the Chinese AI lab that shook the industry with its V1 model in early 2025. V4 comes in two versions.
V4-Pro is the flagship. 1.6 trillion parameters, 1 million token context window. This is the model that hits the SWE-bench numbers. It's built for heavy workloads: coding, reasoning, long-document analysis.
V4-Flash is the lightweight version. 284 billion parameters, optimized for low-latency tasks. Think fast API responses, real-time applications, high-volume processing where speed matters more than maximum capability.
Both models are open-weights. That means anyone can download the model parameters and run them on their own hardware. This is structurally different from Claude or GPT, where you can only access the model through the company's API. Open-weights means a university researcher, a startup, or a government can take V4 and deploy it without paying DeepSeek anything.
The release date is April 24, 2026. It's labeled as a preview, not a final release.
Why Is the Training Cost a Big Deal?
DeepSeek reports the total training cost for V4 at $5.6 million, using 16,000 GPUs.
For comparison, training costs for U.S. frontier models are estimated in the hundreds of millions of dollars. Exact figures are proprietary, but industry estimates for GPT-5 class models range from $200M to over $500M. Anthropic and Google don't publish their training costs either, but they're operating at similar scales.
$5.6M vs hundreds of millions is not a rounding error. It's a difference in category. If these numbers hold up under scrutiny, it means the cost floor for training a frontier-competitive model is dramatically lower than what U.S. labs are spending.
There are multiple possible explanations. DeepSeek may have found more efficient training methods. The Mixture-of-Experts architecture (where only a fraction of parameters activate per query) reduces compute requirements. Or the cost comparison may not be apples-to-apples, since different labs count different things in their "training cost" figures.
The number is worth taking seriously. It's also worth questioning.
How Does It Actually Perform?
The headline benchmark is SWE-bench Verified, which tests whether a model can solve real GitHub issues from open-source projects. It's one of the most respected coding benchmarks because it uses actual production problems, not synthetic tests.
V4-Pro: 80.6%. Claude Opus 4.6: 80.8%.
That's a 0.2 percentage point gap. In practical terms, the two models solve almost the same set of problems. DeepSeek also claims V4-Pro performs at a level comparable to Gemini Pro 3.1 across broader evaluations.
This is not the profile of a "cheap but worse" model. When the performance gap on a credible benchmark is less than half a percentage point, the conversation shifts from "is it good enough?" to "why does the other one cost 7x more?"
What Do the Prices Look Like?
Here's the per-million-token pricing side by side.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| V4-Flash | $0.14 | $0.28 |
| V4-Pro | $1.74 | $3.48 |
| Claude Opus 4.6 | $5.00 | $25.00 |
V4-Pro output costs $3.48 per million tokens. Claude Opus output costs $25.00. That's roughly a 7.2x price difference on the output side.
For a developer running a few queries a day, the difference is a few dollars. For a company running agent workflows that process millions of tokens daily, the difference is thousands of dollars per month. At enterprise scale, 7x cheaper changes the economics of which models you can afford to run on which tasks.
V4-Flash at $0.28 output is in a different category entirely. That's cheap enough to use as a high-volume preprocessing layer, a triage agent, or a real-time classification engine where you'd never deploy a $25/M model.
What's the Huawei Chip Story?
This is the part that gets geopolitical, but the facts are straightforward.
Reuters confirmed on April 4, 2026 that DeepSeek's actual training infrastructure runs on Huawei Ascend 950PR chips, not NVIDIA hardware. The $5.6M cost figure and 16,000 GPU count reference Hopper-generation equivalents for comparison purposes, but the physical chips are Chinese-made.
The context: the U.S. has export controls restricting sales of advanced NVIDIA chips (H100, H200, and successors) to China. These controls were designed to slow China's AI development by limiting access to the best training hardware.
DeepSeek training a frontier-competitive model on Huawei chips is a data point. It suggests the export controls haven't created the bottleneck they were intended to create, at least not at DeepSeek's scale. Whether this generalizes to all Chinese AI labs or is specific to DeepSeek's engineering approach is an open question.
What Does This Mean for OpenAI and Anthropic?
Two assumptions get pressured.
The training cost assumption. If a frontier-quality model can be trained for $5.6M, the argument that you need billions in funding to compete at the frontier weakens. That doesn't mean OpenAI and Anthropic are wasting money. They may be investing in capabilities that DeepSeek hasn't matched on benchmarks that DeepSeek didn't publish. But the "only well-funded U.S. labs can build frontier models" premise takes a hit.
The pricing assumption. Claude Opus at $25/M output and V4-Pro at $3.48/M output, with near-identical SWE-bench scores, creates direct pricing pressure. Enterprise customers running large-scale AI workloads will ask why they're paying 7x more. The answer might be reliability, safety, support, or capabilities on tasks beyond SWE-bench. But the question will be asked.
The open-weights dimension adds a third layer. Claude and GPT are API-only. V4 can be self-hosted. For organizations with data privacy requirements, regulatory constraints, or sovereignty concerns, self-hosting is not a nice-to-have. It's a requirement. Open-weights gives DeepSeek access to markets that closed-model companies structurally cannot serve.
What Are the Reasons for Skepticism?
Several, and they're legitimate.
Benchmark selection matters. DeepSeek published SWE-bench Verified, where V4-Pro scores well. There are dozens of other benchmarks. Performance on one test doesn't guarantee performance on all tasks. Until independent reviewers run V4 across a broader evaluation suite, the 80.6% number is one data point, not a complete picture.
It's a preview. This is not a final release. Preview models can have bugs, instability, or performance quirks that get resolved (or don't) in the production version. Evaluating a preview as if it's a finished product overstates the case.
Cost accounting is opaque. $5.6M may or may not include research costs, failed training runs, data preparation, or infrastructure amortization. Different companies count different things. Without standardized cost reporting, direct comparisons are directional, not precise.
Regulatory and political risk. DeepSeek operates in China. Chinese AI companies face their own regulatory environment, content restrictions, and geopolitical exposure. For companies outside China, depending on a Chinese model provider introduces supply chain considerations that don't apply to U.S. or European alternatives.
What Does This Mean for You?
If you're a developer: The cost of running frontier-quality AI just dropped. V4-Pro at $3.48/M is cheap enough to experiment with agent architectures that would be prohibitively expensive on Claude or GPT. V4-Flash at $0.28/M opens up use cases where you wouldn't run an LLM at all before.
If you're an investor: The premium that U.S. AI labs command is partly based on the assumption that frontier capability requires frontier budgets. DeepSeek V4 challenges that assumption. Whether it changes valuations depends on whether V4's performance holds up across real-world deployment, not just benchmarks.
If you're a regular user: More competition means better models at lower prices. Whether you use DeepSeek directly or not, its existence puts downward pressure on what everyone else charges. The model you use next year will likely be better and cheaper partly because of releases like this one.
Follow @easyai.ai for more breakdowns like this.
---
Sources
- CNN
- CNBC
- Simon Willison analysis
- Analytics India Magazine
- Startup Fortune
- DeepSeek API docs
- NxCode specs
---
Want more?
Browse our prompt packs, guides, and automation tools.