1. K2.5 Instant – fast responses
2. K2.5 Thinking – deep reasoning
3. K2.5 Agent – tool use
4. K2.5 Agent Swarm (Beta) – parallel execution
Test it:
→ kimi.com
→ platform.moonshot.ai
→ HuggingFace: moonshotai/Kimi-K2.5
1. K2.5 Instant – fast responses
2. K2.5 Thinking – deep reasoning
3. K2.5 Agent – tool use
4. K2.5 Agent Swarm (Beta) – parallel execution
Test it:
→ kimi.com
→ platform.moonshot.ai
→ HuggingFace: moonshotai/Kimi-K2.5
China keeps shipping powerful open-weight models:
→ DeepSeek V3
→ Qwen
→ GLM
→ Now Kimi K2.5
Meanwhile Moonshot just raised at $4.3B valuation and is already seeking $5B.
The open-source AI race is heating up fast.
China keeps shipping powerful open-weight models:
→ DeepSeek V3
→ Qwen
→ GLM
→ Now Kimi K2.5
Meanwhile Moonshot just raised at $4.3B valuation and is already seeking $5B.
The open-source AI race is heating up fast.
• API: $0.60/M input, $3/M output
• That's ~5× cheaper than GPT-5
• Open weights on HuggingFace
• Modified MIT license
You can self-host if you've got the hardware (~600GB for INT4 quantized).
• API: $0.60/M input, $3/M output
• That's ~5× cheaper than GPT-5
• Open weights on HuggingFace
• Modified MIT license
You can self-host if you've got the hardware (~600GB for INT4 quantized).
• Feed it a Figma screenshot → get production React
• Video-to-code workflows
• Visual debugging
Moonshot also launched Kimi Code CLI – a terminal agent rivaling Claude Code.
Works with VSCode, Cursor, Zed. Accepts images/videos as input.
• Feed it a Figma screenshot → get production React
• Video-to-code workflows
• Visual debugging
Moonshot also launched Kimi Code CLI – a terminal agent rivaling Claude Code.
Works with VSCode, Cursor, Zed. Accepts images/videos as input.
K2.5 can spawn and coordinate up to 100 sub-agents in parallel, executing 1,500+ tool calls concurrently.
Result? 4.5× faster task completion vs single-agent mode.
This is trained via their new Parallel-Agent Reinforcement Learning (PARL).
K2.5 can spawn and coordinate up to 100 sub-agents in parallel, executing 1,500+ tool calls concurrently.
Result? 4.5× faster task completion vs single-agent mode.
This is trained via their new Parallel-Agent Reinforcement Learning (PARL).
• MoE architecture (1T total, 32B active)
• Native multimodal – vision trained from day one, not bolted on
• 256K context window
• Thinking + Instant modes
• Beats GPT-5.2 on HLE-Full benchmark
• Tops SWE-Bench Multilingual
Seriously competitive with frontier models.
• MoE architecture (1T total, 32B active)
• Native multimodal – vision trained from day one, not bolted on
• 256K context window
• Thinking + Instant modes
• Beats GPT-5.2 on HLE-Full benchmark
• Tops SWE-Bench Multilingual
Seriously competitive with frontier models.