In today's Generative AI lecture, we dive into reasoning models by dissecting how DeepSeek-R1 works (GRPO vs. PPO, which removes the need for a separate value network + training with a simpler rule-based reward), and end on mechanistic interpretability to better understand those reasoning traces.
November 10, 2025 at 8:46 PM
In today's Generative AI lecture, we dive into reasoning models by dissecting how DeepSeek-R1 works (GRPO vs. PPO, which removes the need for a separate value network + training with a simpler rule-based reward), and end on mechanistic interpretability to better understand those reasoning traces.
Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona
Modelos de lenguaje aún confunden las creencias con los hechos
Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona
ultimasnoticias.com.ve
November 10, 2025 at 8:17 PM
Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona
Kimi K2 Thinking comparison with GPT-5, DeepSeek R1, Claude Opus 4, Qwen 3, and Grok 4 and how it has achieved the same/ better level reasoning at 10x lower cost.
Discover benchmarks, business applications, pricing, implementation strategies and more here:
ilyasiqbal.com/2025/11/09/k...
Discover benchmarks, business applications, pricing, implementation strategies and more here:
ilyasiqbal.com/2025/11/09/k...
Kimi K2 Thinking: A New Frontier for Agentic AI, Benchmarks and Pricing - Ilyas Iqbal
Kimi K2 Thinking revolutionizes AI with open-source GPT-5 level reasoning at 10x lower cost. Discover benchmarks, business applications, and implementation strategies.
ilyasiqbal.com
November 9, 2025 at 9:10 PM
Kimi K2 Thinking comparison with GPT-5, DeepSeek R1, Claude Opus 4, Qwen 3, and Grok 4 and how it has achieved the same/ better level reasoning at 10x lower cost.
Discover benchmarks, business applications, pricing, implementation strategies and more here:
ilyasiqbal.com/2025/11/09/k...
Discover benchmarks, business applications, pricing, implementation strategies and more here:
ilyasiqbal.com/2025/11/09/k...
Instruct v0.2、Qwen 2.5 7B Instruct、Gemma 3 4B Instruct、DeepSeek-R1-Distill-Llama-8B 和 Apertus-8B-2509,发现他们开发的分类器能以 70%-80% 的准确率识别出 AI 生成的回复。
November 9, 2025 at 2:31 PM
Instruct v0.2、Qwen 2.5 7B Instruct、Gemma 3 4B Instruct、DeepSeek-R1-Distill-Llama-8B 和 Apertus-8B-2509,发现他们开发的分类器能以 70%-80% 的准确率识别出 AI 生成的回复。
DeepSeek R1 模型有什麼特色
DeepSeek R1以在地化的繁體中文理解為核心,結合高效推理與輕量化部署,適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組,以及智慧城市數據的安全治理,資料留在本地,符合台灣個資法與雲端合規。多語支援與即時多模態能力,提升客服、技術支援與分析效率,為台灣用戶提供穩定、低延遲的人工智慧解決方案。
DeepSeek R1以在地化的繁體中文理解為核心,結合高效推理與輕量化部署,適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組,以及智慧城市數據的安全治理,資料留在本地,符合台灣個資法與雲端合規。多語支援與即時多模態能力,提升客服、技術支援與分析效率,為台灣用戶提供穩定、低延遲的人工智慧解決方案。
DeepSeek R1 模型有什麼特色
DeepSeek R1以在地化的繁體中文理解為核心,結合高效推理與輕量化部署,適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組,以及智慧城市數據的安全治理,資料留在本地,符合台灣個資法與雲端合規。多語支援與即時多模態能力,提升客服、技術支援與分析效率,為台灣用戶提供穩定、低延遲的人工智慧解決方案。
www.isuperman.tw
November 8, 2025 at 11:49 PM
DeepSeek R1 模型有什麼特色
DeepSeek R1以在地化的繁體中文理解為核心,結合高效推理與輕量化部署,適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組,以及智慧城市數據的安全治理,資料留在本地,符合台灣個資法與雲端合規。多語支援與即時多模態能力,提升客服、技術支援與分析效率,為台灣用戶提供穩定、低延遲的人工智慧解決方案。
DeepSeek R1以在地化的繁體中文理解為核心,結合高效推理與輕量化部署,適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組,以及智慧城市數據的安全治理,資料留在本地,符合台灣個資法與雲端合規。多語支援與即時多模態能力,提升客服、技術支援與分析效率,為台灣用戶提供穩定、低延遲的人工智慧解決方案。
“large reasoning models, such as OpenAI’s o1 Jaech et al. (2024), Qwen-QwQ Team , and DeepSeek-R1 Team (2024), have demonstrated impressive stepwise reasoning capabilities over long sequences through large-scale reinforcement learning.” arxiv.org/abs/2502.04644
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Agentic Reasoning dynamically leverages web search, code execu...
arxiv.org
November 8, 2025 at 4:05 PM
“large reasoning models, such as OpenAI’s o1 Jaech et al. (2024), Qwen-QwQ Team , and DeepSeek-R1 Team (2024), have demonstrated impressive stepwise reasoning capabilities over long sequences through large-scale reinforcement learning.” arxiv.org/abs/2502.04644
過去に作った動画です。
無料で使える最強AI決定戦!GPT OSS・Qwen3・DeepSeek R1の出力品質を比較してみた
🔥 GPT-OSS vs Qwen3 vs DeepSeek-R1 徹底比較!オープンソースAI最強決定戦 🔥
今回はOpenAIが公開し...
URL: https://www.youtube.com/watch?v=rLb71_GPQLI
無料で使える最強AI決定戦!GPT OSS・Qwen3・DeepSeek R1の出力品質を比較してみた
🔥 GPT-OSS vs Qwen3 vs DeepSeek-R1 徹底比較!オープンソースAI最強決定戦 🔥
今回はOpenAIが公開し...
URL: https://www.youtube.com/watch?v=rLb71_GPQLI
November 8, 2025 at 12:18 PM
過去に作った動画です。
無料で使える最強AI決定戦!GPT OSS・Qwen3・DeepSeek R1の出力品質を比較してみた
🔥 GPT-OSS vs Qwen3 vs DeepSeek-R1 徹底比較!オープンソースAI最強決定戦 🔥
今回はOpenAIが公開し...
URL: https://www.youtube.com/watch?v=rLb71_GPQLI
無料で使える最強AI決定戦!GPT OSS・Qwen3・DeepSeek R1の出力品質を比較してみた
🔥 GPT-OSS vs Qwen3 vs DeepSeek-R1 徹底比較!オープンソースAI最強決定戦 🔥
今回はOpenAIが公開し...
URL: https://www.youtube.com/watch?v=rLb71_GPQLI
this morning, X is saturated with people from US claiming that their favorite unknown benchmark (that happens to show K2 trailing US models) is actually the best single benchmark to watch
lol notice how they clipped off the top 12
lol notice how they clipped off the top 12
November 8, 2025 at 12:10 PM
this morning, X is saturated with people from US claiming that their favorite unknown benchmark (that happens to show K2 trailing US models) is actually the best single benchmark to watch
lol notice how they clipped off the top 12
lol notice how they clipped off the top 12
I know my Titan XP is old, but I'm a disappointed that Deepseek-R1 requires 2 of them just to do inference at the int4 level. I would need 40 to do full training at float32! Madness.
November 7, 2025 at 6:19 PM
I know my Titan XP is old, but I'm a disappointed that Deepseek-R1 requires 2 of them just to do inference at the int4 level. I would need 40 to do full training at float32! Madness.
Models show varying error patterns. Claude and some GPT-family models underperform on tasks that require outputting dates; Gemini and Deepseek-R1 frequently over-reason and fail to return an answer at all on Oolong-synth, although Gemini is the best model on Oolong-real.
November 7, 2025 at 5:07 PM
Models show varying error patterns. Claude and some GPT-family models underperform on tasks that require outputting dates; Gemini and Deepseek-R1 frequently over-reason and fail to return an answer at all on Oolong-synth, although Gemini is the best model on Oolong-real.
So rather than "never mention it again" - I assume that poster was talking about DeepSeek-R1 - they haven't *been in the news* like the R1 release was, but they've been doing the work - DeepSeek and others - and putting it out there.
November 7, 2025 at 12:42 PM
So rather than "never mention it again" - I assume that poster was talking about DeepSeek-R1 - they haven't *been in the news* like the R1 release was, but they've been doing the work - DeepSeek and others - and putting it out there.
Compared to DeepSeek R1 release, K2 Thinking seems to be making relatively few waves.
I guess most peeps are default-assuming the benchmarks are seriously gamed.
I guess most peeps are default-assuming the benchmarks are seriously gamed.
November 7, 2025 at 12:27 PM
Compared to DeepSeek R1 release, K2 Thinking seems to be making relatively few waves.
I guess most peeps are default-assuming the benchmarks are seriously gamed.
I guess most peeps are default-assuming the benchmarks are seriously gamed.
Best breakdown of modern LLM architectures
From DeepSeek to GPT-OSS, it’s all here ↓
Covers every flagship model
1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS
#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics
From DeepSeek to GPT-OSS, it’s all here ↓
Covers every flagship model
1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS
#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics
November 7, 2025 at 12:27 PM
Best breakdown of modern LLM architectures
From DeepSeek to GPT-OSS, it’s all here ↓
Covers every flagship model
1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS
#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics
From DeepSeek to GPT-OSS, it’s all here ↓
Covers every flagship model
1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS
#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics
K2-Thinking is SOTA, top model in agentic tool calling
November 7, 2025 at 10:40 AM
K2-Thinking is SOTA, top model in agentic tool calling
I just saw the Kimi K2 Thinking release!
Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.
In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.
In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
November 6, 2025 at 7:35 PM
I just saw the Kimi K2 Thinking release!
Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.
In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.
In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
Reminder>> Evaluating the Accuracy of the DeepSeek-R1 Large Language Model for Detecting Errors in Emergency Radiology Reports (preprint) #openscience #PeerReviewMe #PlanP
Evaluating the Accuracy of the DeepSeek-R1 Large Language Model for Detecting Errors in Emergency Radiology Reports
Date Submitted: Oct 31, 2025.
Open Peer Review Period: Nov 3, 2025 - Dec 29, 2025.
dlvr.it
November 6, 2025 at 6:02 PM
Reminder>> Evaluating the Accuracy of the DeepSeek-R1 Large Language Model for Detecting Errors in Emergency Radiology Reports (preprint) #openscience #PeerReviewMe #PlanP
the "Deepseek Moment" was also because the prose of R1 was so refreshing compared to the free ChatGPT they had used until then.
Kimi K2 non-thinking already had incredible writing, so...
Kimi K2 non-thinking already had incredible writing, so...
November 6, 2025 at 3:55 PM
the "Deepseek Moment" was also because the prose of R1 was so refreshing compared to the free ChatGPT they had used until then.
Kimi K2 non-thinking already had incredible writing, so...
Kimi K2 non-thinking already had incredible writing, so...
Origin
ostatus.taiyolab.com
November 6, 2025 at 9:13 AM
deepseek-r1:7bはローカルと思えないほどの品質だけど、推論モデルの待ち時間はこの手のスニップレットに向かない。
November 6, 2025 at 9:08 AM
deepseek-r1:7bはローカルと思えないほどの品質だけど、推論モデルの待ち時間はこの手のスニップレットに向かない。
Startup Fortytwo published benchmarks claiming small models running on personal computers outperform OpenAI's GPT-5, Google Gemini 2.5 Pro, Anthropic Claude Opus 4.1 and DeepSeek R1 on reasoning tests, advocating decentralized swarm inference and crypto-based rewards.
Get Ready to Hear a Lot About Robot and AI 'Swarms'
Thousands of robo-brains are better than one.
gizmodo.com
November 5, 2025 at 6:25 PM
Startup Fortytwo published benchmarks claiming small models running on personal computers outperform OpenAI's GPT-5, Google Gemini 2.5 Pro, Anthropic Claude Opus 4.1 and DeepSeek R1 on reasoning tests, advocating decentralized swarm inference and crypto-based rewards.
Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona, lo que podría tener consecuencias graves en ámbitos como la medicina, el derecho o el periodismo
Los modelos de lenguaje aún confunden las creencias con los hechos, incluso los más avanzados
Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona, lo que podría tener consecuencias graves en ámbitos como la medicina, el derecho o el periodismo.
www.agenciasinc.es
November 4, 2025 at 8:56 AM
Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona, lo que podría tener consecuencias graves en ámbitos como la medicina, el derecho o el periodismo
Canonical just made local GenAI easy on Ubuntu! New release offers hardware-optimized install for open-source models. Good to see they chose DeepSeek R1 and Qwen 2.5 VL—the top performers in my own local tests. Great accessibility move! ubuntu.com/blog/genai-i...
#Ubuntu #GenAI #OpenSourceAI
#Ubuntu #GenAI #OpenSourceAI
Why we brought hardware-optimized GenAI inference to Ubuntu | Ubuntu
Locally install Intel and Ampere's silicon-optimized DeepSeek R1 and Qwen 2.5 VL on Ubuntu 24.04 LTS.
ubuntu.com
November 4, 2025 at 8:40 AM
Canonical just made local GenAI easy on Ubuntu! New release offers hardware-optimized install for open-source models. Good to see they chose DeepSeek R1 and Qwen 2.5 VL—the top performers in my own local tests. Great accessibility move! ubuntu.com/blog/genai-i...
#Ubuntu #GenAI #OpenSourceAI
#Ubuntu #GenAI #OpenSourceAI
How DeepSeek’s R1 Model Challenges OpenAI’s Dominance Exploring the rise of DeepSeek R1, the reasoning-driven AI model disrupting the global AI race and reshaping the future of machine… Continue reading on Artificial Intelligence in Plain English »
Interest | Match | Feed
Interest | Match | Feed
Origin
ai.plainenglish.io
November 4, 2025 at 6:30 AM
How DeepSeek’s R1 Model Challenges OpenAI’s Dominance Exploring the rise of DeepSeek R1, the reasoning-driven AI model disrupting the global AI race and reshaping the future of machine… Conti...
#software-development #openai #artificial-intelligence #technology #machine-learning
Origin | […]
#software-development #openai #artificial-intelligence #technology #machine-learning
Origin | […]
Original post on ai.plainenglish.io
ai.plainenglish.io
November 4, 2025 at 6:30 AM
How DeepSeek’s R1 Model Challenges OpenAI’s Dominance Exploring the rise of DeepSeek R1, the reasoning-driven AI model disrupting the global AI race and reshaping the future of machine… Conti...
#software-development #openai #artificial-intelligence #technology #machine-learning
Origin | […]
#software-development #openai #artificial-intelligence #technology #machine-learning
Origin | […]