On April 20, 2026, Moonshot AI released Kimi K2.6 — and the open-source AI landscape changed overnight. The numbers were not incremental. They were the kind that shift how the entire industry thinks about what open-weight models can achieve without a closed lab's compute budget.
The headline: 80.2% on SWE-Bench Verified — the gold-standard real-world coding benchmark — making K2.6 the top open-weight model on the leaderboard and tying GPT-5.5 on SWE-Bench Pro. At $0.60 per million input tokens, it costs roughly one-quarter of Claude Opus. And it's fully open-source under a Modified MIT License. This is the complete breakdown of everything you need to know about K2.6 before deploying it.
K2.6 is the fourth major Kimi release: K2 (July 2025) → K2.5 (January 2026) → K2.6 (April 2026). Each has narrowed the gap with closed frontier models. K2.6 is the first open-weight model to genuinely tie GPT-5.5 on a real-world coding benchmark. Important: the kimi-k2 series is officially discontinued May 25, 2026. K2.6 is the only supported branch going forward.
Architecture: What Makes K2.6 Different
Kimi K2.6 is built on a Mixture-of-Experts (MoE) architecture — the same foundational approach as GPT-4 and Gemini 1.5 Pro, but with a larger expert pool and a specialised training curriculum focused on coding and agentic tasks. Understanding the architecture explains both the performance and the cost advantage.
The MoE design means that despite having 1 trillion total parameters, only 32 billion are activated per token — giving K2.6 both the knowledge capacity of a 1T dense model and the inference efficiency of a 32B model. This is the primary reason it can run at $0.60/M input tokens while matching closed frontier models on coding tasks. The 256K token context window — double ChatGPT's 128K — is a further architectural advantage for large codebase and multi-document tasks.
K2.6 is also natively multimodal: it accepts text, code, images, and documents as input. This is what enables its website generation from screenshots and design mockups — you can upload a Figma frame and get working HTML/CSS back.
Agent Swarm: K2.6's Defining Capability
Agent Swarm is K2.6's headline agentic architecture — and it is a genuine architectural change, not a prompting trick. It enables K2.6 to decompose complex tasks into parallel sub-agent workstreams, coordinate their outputs, and synthesise a coherent final result.
K2.5 capped at 100 sub-agents and 1,500 steps. K2.6 triples both to 300 sub-agents and 4,000 coordinated steps. This isn't merely a bigger number — it enables qualitatively different task decomposition. A task that would cause K2.5 to lose coherence after 50 steps can now be reliably completed end-to-end. Moonshot's reference benchmark shows K2.6 sustaining a 12-hour autonomous coding session making over 4,000 tool calls without producing contradictory output across sub-agents.
Agent Swarm is what powers K2.6's full-stack website generation: one sub-agent handles API design, one handles front-end components, one handles database schema, one handles tests — all running in parallel, outputs synthesised into a coherent codebase. No other open-weight model offers this depth of agentic orchestration.
Complete Benchmark Results: Honest Numbers
K2.6 leads on coding and agentic tasks. Closed models retain leads on science reasoning and hard maths. Here is the complete picture without cherry-picking:
| Benchmark | Kimi K2.6 | GPT-5.5 | Claude Opus 4.7 | What It Tests |
|---|---|---|---|---|
| SWE-Bench Verified | 80.2% ✓ | ~79% | ~78% | Real GitHub issue resolution |
| AA Intelligence Index | 54 (best open) ✓ | 60 | 58 | Composite intelligence score |
| Terminal-Bench 2.0 | 66.7% ✓ | ~65% | ~64% | Autonomous execution tasks |
| BrowseComp | 86.3% | ~88% ✓ | ~87% | Web browsing + comprehension |
| GPQA-Diamond | 90.5% | 92.8% ✓ | 91.2% | Grad-level science reasoning |
| AIME 2026 | 96.4% | 99.2% ✓ | 97.8% | Math olympiad problems |
| Code Arena WebDev Elo | 1,529 (Rank 6/67) ✓ | ~1,560 | ~1,545 | Human-preference coding eval |
| Inference Speed | 58.6 t/s ✓ | ~50 t/s | ~45 t/s | Tokens/sec on official API |
Pricing: The Economics That Change Everything
This is where the K2.6 story becomes most compelling for teams running AI in production. It is not just competitively priced — it is structurally cheaper in a way that changes what is economically viable to build. At $0.60/M input tokens versus Claude Opus 4.7's $15.00/M, you can afford to run agent pipelines that would be financially impossible with closed models. See all 6 ways to access Kimi for free in our dedicated guide.
| Model | Input ($/M) | Output ($/M) | Context | vs K2.6 |
|---|---|---|---|---|
| Kimi K2.6 | $0.60 ✓ | $2.50–$3.00 ✓ | 256K | — |
| GPT-5.5 | $5.00–$10.00 | $15.00–$30.00 | 128K | 8–10× costlier |
| Claude Opus 4.7 | $15.00 | $75.00 | 200K | 25× costlier |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | 5× costlier |
| GPT-4o | $2.50 | $10.00 | 128K | 4× costlier |
| Cloudflare (free tier) | Free (10K neurons/day) | Free | 256K | Free ✓ |
A team processing 100 million tokens/month through coding agents — PR reviews, test generation, codebase analysis — pays approximately $600/month with K2.6 vs $15,000/month with Claude Opus. That $14,400/month difference changes what you can afford to automate entirely.
Open Source License: What It Actually Allows
K2.6 is released under a Modified MIT License — critically more permissive than many "open-source" AI releases that quietly restrict commercial use. Here is precisely what the licence allows and where the single restriction lies:
Commercial Use — Allowed for Most
You can use K2.6 weights in commercial SaaS, internal tools, client products, and API services without attribution or licensing fees — for most organisations.
Self-Hosting — Fully Allowed
Download weights from Hugging Face (moonshotai/Kimi-K2.6), deploy on your own infrastructure using vLLM, SGLang, or llama.cpp. Zero per-token cost, full data privacy, no dependency on Moonshot's API.
Fine-Tuning — Fully Allowed
Fine-tune K2.6 on your own domain data — legal, medical, finance, proprietary codebase. Derivative models and fine-tuned checkpoints are permitted under the Modified MIT terms.
Attribution Required at Scale Only
The modified part: attribution ("Kimi K2.6 by Moonshot AI") is required only for platforms exceeding 100 million monthly active users OR $20 million monthly revenue. Below those thresholds, no attribution required whatsoever.
Self-Hosting: Hardware Requirements & Setup
Self-hosting gives you zero per-token cost, complete data sovereignty, and unlimited inference. The hardware requirements are serious, but there are practical paths for different resource levels:
Full K2.6 model requires approximately 250GB+ combined VRAM + RAM for INT4-quantized inference (running at ~32B activation cost). Non-quantized needs ~600GB+. Most individuals use community quantized versions via Unsloth or llama.cpp. Recommended production: vLLM or SGLang with tensor parallelism across multiple H100 80GB GPUs.
The 256K Context Window Advantage
K2.6's 256K token context window is double ChatGPT's 128K limit. In practice, this means K2.6 can hold entire codebases, full contract bundles, or multi-hour conversation histories in a single request — enabling the kinds of complex, large-scale website and app generation tasks that would require chunking workarounds on shorter-context models. Explore the full implications in our guide: Kimi AI Long Context Explained.
256K tokens ≈ 200,000 words · full novel · entire mid-sized codebase
Controversy & Competitive Context
In February 2026, Anthropic publicly accused Moonshot AI (with DeepSeek and MiniMax) of using fraudulent accounts to generate Claude conversations for training data distillation. Moonshot has neither confirmed nor denied this. It remains unresolved. If your organisation has strict AI supply-chain compliance requirements, this is worth factoring into your evaluation.
In March 2026, Cursor — a $50B code editor — was found to be using Kimi K2.5 as the underlying model for its Composer 2 feature without initial disclosure. Co-founder Aman Sanger confirmed: "It was a miss to not mention the Kimi base in our blog from the start." Now a disclosed partnership. That a $50B company chose a Chinese open-source model for its flagship coding feature tells you everything about K2.6's real-world quality ceiling.
Frequently Asked Questions
kimi-k2.6 in their API calls before that date. K2.6 is the only supported branch going forward.