DeepSeeek V4 is out, touting some disruptive wins over Gemini, ChatGPT, and Claude

China’s DeepSeek has a habit of showing up, uninvited, to Silicon Valley’s AI party, and this time, it has done so with the long-awaited V4 preview. The Hangzhou-based company has released its latest AI model, which beats popular American models in certain areas.

DeepSeek has launched two new models: V4-Pro (Expert mode) and V4-Flash (Instant mode). While the former is a massive 1.6 trillion parameter model, the latter is at a more manageable 284 billion parameters. However, both of them have a one-million-token context window.

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models.🔹 DeepSeek-V4-Flash: 284B total / 13B active params.… pic.twitter.com/n1AgwMIymu

— DeepSeek (@deepseek_ai) April 24, 2026

What exactly did DeepSeek release?

What’s even more important is that both models are open source, meaning they’re available to download from Hugging Face and run locally on your hardware. However, V4-Pro’s sheer scale means that you’ll need a considerable amount of VRAM to run it locally.

One of the most interesting parts of the announcement is the comparison with popular AI models like Gemini, ChatGPT, and Claude. For instance, V4-Pro punches hard in coding, scoring 3,206 on Codeforces ratings, clearing GPT-5.4’s 3,168, and Gemini 3.1’s 3,052. This makes it the strongest open model for competitive programming tasks.

On LiveCodeBench, V4-Pro posts 93.5, ahead of Claude Opus 4.6’s 88.8 and Gemini 91.7, and likewise, for agentic tasks, it scores 51.8 on Toolathlon, beating both Claude (47.2) and Gemini (48.8). The faster and more efficient V4-Flash, meanwhile, matches V4-Pro on simple agent tasks, at a fraction of the compute cost.

Where does V4-Pro beat the competition?

BenchmarkDeepSeek V4-ProClaude Opus 4.6GPT-5.4Gemini 3.1 ProCodeforces (Rating)3,206—3,1683,052LiveCodeBench (Pass@1)93.588.8—91.7Apex Shortlist (Pass@1)90.285.978.189.1SWE Verified (Resolved)80.680.8—80.6Toolathlon (Pass@1)51.847.254.648.8Terminal Bench 2.0 (Acc)67.965.475.168.5MRCR 1M Long Context83.592.9—76.3HMMT 2026 Math95.296.297.794.7IMOAnswerBench89.875.391.481.0

There are several areas where DeekSeek’s new model runs behind the competition, though. For instance, Claude’s Opus 4.6 leads on long-context retrieval. It scores 92.9 on MRCR 1M versus V4-Pro’s 83.5. GPT-5.4 still tops Terminal Bench 2.0 at 75.1 against V4-Pro’s 67.9.

Where DeepSeek truly disrupts the competition is the pricing. The V4-Pro costs $3.48 per million output tokens, which, compared to OpenAI’s $30 and Anthropic’s $25 for equivalent workloads, might sound much more attractive to potential customers. That gap is enormous for everyday developers building AI-powered apps.

Scientists demonstrate underground wireless communication, even through stony bedrock

Your brain can spot AI voices even when you can’t

Scientists have found a way to hide data in plain sight, and hackers can’t touch it

Grok AI deepfake victim says UK government should have acted faster

Monzo bank says issue affecting its mobile app resolved

DeepSeeek V4 is out, touting some disruptive wins over Gemini, ChatGPT, and Claude

Related

Sony’s table tennis robot made me think about what happens when AI gets a body

Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them.

X is closing communities. But hey, you now have custom timelines and group chats

OpenAI pushes ChatGPT toward autonomous work with GPT-5.5

Microsoft Copilot can now do actual work inside your Word, Excel, and PowerPoint files

Google Meet will soon jot down notes for you, even if it’s an in-person meeting

ChatGPT workspace agents turn AI into a team member

Leave a Reply Cancel reply