More

    DeepSeeek V4 is out, touting some disruptive wins over Gemini, ChatGPT, and Claude

    Published on:


    China’s DeepSeek has a habit of showing up, uninvited, to Silicon Valley’s AI party, and this time, it has done so with the long-awaited V4 preview. The Hangzhou-based company has released its latest AI model, which beats popular American models in certain areas. 

    DeepSeek has launched two new models: V4-Pro (Expert mode) and V4-Flash (Instant mode). While the former is a massive 1.6 trillion parameter model, the latter is at a more manageable 284 billion parameters. However, both of them have a one-million-token context window. 

    🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

    🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models.🔹 DeepSeek-V4-Flash: 284B total / 13B active params.… pic.twitter.com/n1AgwMIymu

    — DeepSeek (@deepseek_ai) April 24, 2026

    What exactly did DeepSeek release?

    What’s even more important is that both models are open source, meaning they’re available to download from Hugging Face and run locally on your hardware. However, V4-Pro’s sheer scale means that you’ll need a considerable amount of VRAM to run it locally. 

    One of the most interesting parts of the announcement is the comparison with popular AI models like Gemini, ChatGPT, and Claude. For instance, V4-Pro punches hard in coding, scoring 3,206 on Codeforces ratings, clearing GPT-5.4’s 3,168, and Gemini 3.1’s 3,052. This makes it the strongest open model for competitive programming tasks. 

    On LiveCodeBench, V4-Pro posts 93.5, ahead of Claude Opus 4.6’s 88.8 and Gemini 91.7, and likewise, for agentic tasks, it scores 51.8 on Toolathlon, beating both Claude (47.2) and Gemini (48.8). The faster and more efficient V4-Flash, meanwhile, matches V4-Pro on simple agent tasks, at a fraction of the compute cost. 

    Where does V4-Pro beat the competition?

    BenchmarkDeepSeek V4-ProClaude Opus 4.6GPT-5.4Gemini 3.1 ProCodeforces (Rating)3,206—3,1683,052LiveCodeBench (Pass@1)93.588.8—91.7Apex Shortlist (Pass@1)90.285.978.189.1SWE Verified (Resolved)80.680.8—80.6Toolathlon (Pass@1)51.847.254.648.8Terminal Bench 2.0 (Acc)67.965.475.168.5MRCR 1M Long Context83.592.9—76.3HMMT 2026 Math95.296.297.794.7IMOAnswerBench89.875.391.481.0

    There are several areas where DeekSeek’s new model runs behind the competition, though. For instance, Claude’s Opus 4.6 leads on long-context retrieval. It scores 92.9 on MRCR 1M versus V4-Pro’s 83.5. GPT-5.4 still tops Terminal Bench 2.0 at 75.1 against V4-Pro’s 67.9. 

    Where DeepSeek truly disrupts the competition is the pricing. The V4-Pro costs $3.48 per million output tokens, which, compared to OpenAI’s $30 and Anthropic’s $25 for equivalent workloads, might sound much more attractive to potential customers. That gap is enormous for everyday developers building AI-powered apps. 



    Related

    Leave a Reply

    Please enter your comment!
    Please enter your name here