Why does DeepSeek have more influence than Qwen when their performance is so close? 🌟Insights from Zhihu contributor 情酱 Here’s a view that may not be universally popular: DeepSeek’s bigger influence is not really about which model is stronger. The real gap comes from something else: DeepSeek is willing to show the entire research process. 🔍 I’ve been following both DeepSeek and Qwen’s technical reports since the V3 era. The difference is obvious. Reading DeepSeek’s papers feels like sitting down with an engineer who walks you through the whole journey: what they tried, where they failed, why they chose one path over another, and even which experiments did not work. A recent example is DeepSeek V4, released on April 24. Its 58-page technical report was uploaded to Hugging Face on the same day. Chinese tech media even described it as “surprisingly detailed.” Before V4, DeepSeek released several full research papers, including mHC, Engram, and DualPath. When the V4 report came out, the community could check directly: which techniques landed, which did not, and why. But the part I respect most is not the technical detail itself. It is the honesty. 🧠 In the V4 report, DeepSeek essentially said that V4-Pro-Max outperforms GPT-5.2 and Gemini-3.0-Pro in reasoning, but still slightly lags behind GPT-5.4 and Gemini-3.1-Pro. It also admitted that its development trajectory is roughly 3 to 6 months behind the frontier closed-source models. Think about that. An AI company wrote in its own technical report: “We are still 3 to 6 months behind the very best models.” That level of transparency is rare in the AI industry. R1 followed the same philosophy. The paper explained how GRPO was designed, why PPO was not used, and listed hyperparameters and learning rates. It even included a Negative Results section, explaining that PRM and MCTS were tried and failed, along with the reasons behind those failures. Later versions added more training details, including the RL training architecture based on vLLM. Why does this matter? Because research teams around the world can actually reproduce the work. 🌍 Hugging Face’s Open R1 team trained models using DeepSeek’s public methods, and the results were close to the official distilled version. That is much more persuasive than posting 100 benchmark charts. Now compare this with Qwen. To be clear, this is not about attacking Qwen. Qwen’s engineering capability is genuinely impressive. Over the past few years, it has open-sourced hundreds of models. The Qwen3.5 series covers everything from 0.8B to 397B parameters. It uses Gated Delta Networks, Early Fusion for native multimodality, a Thinker-Talker architecture for end-to-end audio and video, and a 397B-parameter model that activates only 17B parameters. The technical depth is real. ⚙️ But when you read Qwen’s technical reports, the feeling is different. For Qwen2.5, some readers said the report could be finished in 10 minutes and did not go deep into the technical principles. Much of it focused on benchmark comparisons. With Qwen3 and Qwen3.5, even though the architecture changed meaningfully, it is still hard to find much discussion about training problems, failed attempts, abandoned directions, or lessons learned. That creates a subtle contrast. Every time Qwen releases a new model, the marketing noise is huge: “beats DeepSeek,” “far ahead,” “the world’s No.1 open-source model.” But once the hype cools down and people read the report, there often isn’t that much new methodology to learn from. And those benchmark gaps can disappear in the next iteration. Over time, people start to feel: big thunder, light rain. DeepSeek is almost the opposite. Its releases are not always loud. V4 took 484 days to arrive, with constant rumors and speculation in between. But once the paper dropped, the 58-page report explained everything clearly, from mHC to hybrid attention to the Muon optimizer. The community reaction was immediate. 🚀 To me, these two companies represent two very different open-source philosophies. Qwen feels more like a big tech product route: “I give you the model weights. You can use them. My report tells you how strong I am and where I rank on benchmarks.” That is open-sourcing the result. DeepSeek feels more like a research-driven route: “I give you the model, but I also show you the full research process: the training pipeline, failed paths, hyperparameters, trade-offs, and even how far I still am from the best closed-source models.” That is open-sourcing the process. 🔬 And for the community, the difference is huge. If you give me model weights, I can fine-tune, deploy, and build products. But if you give me a complete training methodology, I can stand on your shoulders, improve your approach, and explore directions you have not tried yet. GRPO is the best example. It was first released with DeepSeekMath in April 2024 and did not attract much attention at the time. But after R1, the entire industry rushed to reproduce it. Kimi, GLM, and many other teams started using similar ideas. By V4, DeepSeek even borrowed the Muon optimizer from Kimi to replace AdamW, and openly credited the source in its paper. That is what a healthy open-source ecosystem looks like: 🔁 • Your method can be reproduced by others. • My method can be adopted by you. • Everyone pushes the field forward together. Qwen is also open-source, but more often it shows the moves rather than teaching people how to train. So why does DeepSeek have greater influence? Not because it crushes Qwen on performance. In many benchmarks, the two are actually close, and Qwen is stronger in some areas. DeepSeek V4 itself admits it still trails the top closed-source models by 3 to 6 months. The real reason is that DeepSeek did something harder: it publicly exposed its failures, limitations, and full methodology. A team that dares to write “we are still 3 to 6 months behind” in its own paper is hard not to trust. Influence is not built by chasing benchmark scores. It is built through trust, one paper at a time. ✍️ 🔗Original article: https://t.co/aBs1aXb8j3 #DeepSeek #Qwen #OpenSourceAI #LLM #AIResearch #MachineLearning #AI
我已经连续用 DeepSeek Pro Max 工作了几天,完成度很好。并不像 DeepSeek 自己说的仅好过 GPT-5.2,从我的使用体验来看,应该在 GPT-5.3~5.4 之间,接近 5.4。最近几天,我只有一次切换到 GPT-5.4 做了一次 bug 修复,其它时候使用 DeepSeek Pro Max 没有感觉工作受到影响。 【引用】为什么 DeepSeek 的影响力比 Qwen 更大,尽管两者性能非常接近? 有一个观点可能不那么受欢迎:DeepSeek 更大的影响力,并不是因为哪个模型更强。真正的差距来自另一件事:DeepSeek 愿意展示完整的研究过程。 我从 V3 时代开始跟踪两家公司的技术报告。区别显而易见。读 DeepSeek 的论文,感觉像是坐下来和一位工程师聊天,他带你走过整段旅程:尝试了什么、在哪里失败、为什么选择这条路、甚至哪些实验没有成功。 最近一个例子是 4 月 24 日发布的 DeepSeek V4,58 页技术报告当天上传到 Hugging Face,中国科技媒体甚至用「出乎意料的详细」来形容它。V4 之前,DeepSeek 还发布了 mHC、Engram、DualPath 等完整研究论文——报告出来时,社区可以直接对照:哪些技术落地了、哪些没有、为什么。 但我最尊重的不是技术细节本身,而是诚实。V4 报告里,DeepSeek 直接写道:V4-Pro-Max 在推理上超过了 GPT-5.2 和 Gemini-3.0-Pro,但仍稍落后于 GPT-5.4 和 Gemini-3.1-Pro。还承认,其发展轨迹大约比前沿闭源模型晚 3 到 6 个月。 想一想——一家 AI 公司在自己的技术报告里写「我们仍落后最优秀的模型 3 到 6 个月」。这种透明度在 AI 行业极为罕见。 R1 遵循同样的哲学:论文解释了 GRPO 的设计思路、为什么没用 PPO,列出了超参数和学习率,甚至有「负面结果」章节——解释了 PRM 和 MCTS 被尝试后失败的原因。后续版本还补充了基于 vLLM 的 RL 训练架构细节。 这为什么重要?因为全球的研究团队可以真正复现这些工作。Hugging Face 的 Open R1 团队用 DeepSeek 公开的方法训练模型,结果接近官方蒸馏版本——这比发布 100 张 benchmark 图表有说服力得多。 再对比 Qwen。这不是攻击 Qwen——它的工程能力是真实的,过去几年开源了数百个模型,Qwen3.5 系列从 0.8B 到 397B,技术深度没问题。但读 Qwen 的技术报告时,感受截然不同。Qwen2.5 的报告据说 10 分钟就能读完,大量篇幅是 benchmark 比较;Qwen3 和 Qwen3.5 架构有实质变化,却依然很难找到对训练问题、失败尝试、放弃方向或经验教训的讨论。 这形成了一种微妙的对比。每次 Qwen 发布新模型,营销噪音都很大:「超越 DeepSeek」「遥遥领先」「全球最强开源模型」。但热度散去、读完报告后,往往找不到多少新方法论。那些 benchmark 差距在下一个版本就可能消失。久而久之,人们开始感觉:雷声大,雨点小。 DeepSeek 几乎是反过来的。V4 等了 484 天,期间传言不断——但论文一发布,58 页把一切解释清楚,从 mHC 到混合注意力到 Muon 优化器。社区反应立竿见影。 在我看来,两家公司代表了两种截然不同的开源哲学: - Qwen 更像大厂产品路线:「我给你模型权重,你可以用。我的报告告诉你我有多强、排名多少。」——开源的是结果。 - DeepSeek 更像研究驱动路线:「我给你模型,还给你完整的研究过程:训练流程、失败路径、超参数、权衡取舍,甚至距最优闭源模型还差多远。」——开源的是过程。 对社区来说,差距是巨大的。给我模型权重,我能微调、部署、做产品;但给我完整的训练方法论,我能站在你的肩膀上,改进你的方法,探索你还没走过的方向。 GRPO 是最好的例子——2024 年 4 月随 DeepSeekMath 首发时无人关注,R1 之后整个行业蜂拥而至复现它。Kimi、GLM 等开始使用类似思路;到了 V4,DeepSeek 甚至从 Kimi 借鉴了 Muon 优化器替代 AdamW,并在论文里明确致谢来源。这才是健康的开源生态:你的方法被我复现,我的方法被你采用,大家一起推动领域前进。 DeepSeek 影响力更大的真正原因,不是它在 benchmark 上碾压 Qwen,而是它做了更难的事:公开展示失败、局限和完整方法论。一个敢在自己论文里写「我们还落后 3 到 6 个月」的团队,很难让人不信任。 影响力不是靠刷 benchmark 分数建立的,而是靠信任,一篇论文一篇论文地积累。
760 tweets · 188 sources