The Open Weight Models that Matter: June 2026

Chris Clark ·6/27/2026

On this page

1. DeepSeek V4 Flash — The first to cross the agentic rubicon
2. GLM 5.2 — The open model that makes Opus-style agentic coding portable
3. MiniMax M3 — The multimodal long-context model
4. NVIDIA Nemotron 3 Ultra — The U.S. open-weight accelerator
At a glance

A slew of compelling open-weight models have been released in the past few months, including from new players in both China and the US. Additionally, open-weight models are having a moment in the sun as cost becomes a central focus of organizations with scaled AI usage. Contrary to the expectations of many, the intelligence and capability of open-weight models are keeping up with US frontier labs, and have been maintaining a consistent 3-6 month gap for over 18 months. The frontier labs do not (at this moment, anyway) appear to be accelerating away from open-weight labs.

There are tremendous cost-savings opportunities moving workloads from frontier models, to open weight. Inevitably the frontier will move on, but costs will continue to drop for any fixed point of intelligence.

Between the pace of releases, the jagged frontier of capabilities, and new entrants, it’s very hard to know what models you might want to look at, and why. As of June 2026, we believe these are the four open-weight models that matter the most.

1. DeepSeek V4 Flash — The first to cross the agentic rubicon

DeepSeek V4 Flash is the first open-weight model that teams immediately dropped into real agentic pipelines as a plausible substitute for an Anthropic- or OpenAI-class frontier model. The larger V4 Pro variant set the ceiling with a score of 80.6% on SWE-bench Verified, the top open-weights score, matching GPT-5.5-class agentic performance. But it is Flash that broke through, because it captures most of that capability at a price that is on the pareto frontier of performance and cost.

The numbers back the behavior. Flash is MIT-licensed, a ~284B param / ~13B active MoE with a 1M-token context, and it lands at 79.0% on SWE-bench Verified — within ~1.6 points of Pro’s 80.6% and its own ~1.6T / 49B sibling. The model was released April 2026.

Adoption has also been driven largely by price; DeepSeek’s first-party API lists Flash at $0.14 / $0.28 per million tokens (in/out), but does retain data for training purposes. With caching, the realized price is even lower ($0.029/m input tokens). DeepSeek launched the model with this pricing and billed it as a 75% “discount”, and made that pricing permanent as of May. That is roughly 150x cheaper than GPT-5.5’s output costs. DeepSeek, however, retains and trains on your data.

Western hosts (e.g. Fireworks, Together, DeepInfra) that do not train on data charge approximately double the first-party price. Still terrific value for the intelligence. We don’t know the extent to which the value of the training data is what allows DeepSeek’s low pricing, or whether it is compute subsidization, etc. — but it is certainly the case that the low first-party price has created competitive dynamics for all providers that have yielded extraordinarily low prices for this level of intelligence.

OpenRouter exposes each provider’s country of origin in the detail pane on the providers tab of the model page, and allows granular control over which data policies are acceptable to you and your organization.

Caveats: text-only; no image or video input. The first-party API routes data through China, with a ToS that permits training on your data — though, as noted, no-train Western hosts are available at comparable prices. Anecdotally the model is better geared toward technical work, with lower satisfaction on writing and tone. We have also seen reports that prompting needs to be refined versus Anthropic models — Flash performs better with very specific instructions than when relying on its own judgement.

Reach for it when: you want a frontier-class agentic/coding model at a fraction of the cost. Start with Flash; step up to Pro only when the marginal quality is worth the premium.

2. GLM 5.2 — The open model that makes Opus-style agentic coding portable

GLM 5.2 was released in mid-June but the early read is very strong. Where DeepSeek broke through on price, GLM 5.2 appears to be breaking through on planning quality and long-horizon coding.

Artificial Analysis has GLM 5.2 as the #1 open-weight model on its Intelligence Index (v4.1) at 51, ahead of Nemotron 3 Ultra (48), MiniMax M3 (44), DeepSeek V4 Pro (44), and Kimi K2.6 (43) and just ~5 points below Claude Fable 5. It leads open weights on Artificial Analysis’ real-world agentic benchmark (GDPval-AA v2), effectively level with GPT-5.5 xhigh.

Adoption is early, but the model is already available across a meaningful set of third-party hosts. Pricing is not DeepSeek-cheap, but it is still meaningfully below closed frontier coding models. Realized pricing is an OpenRouter weighted-average $0.447 / $3.31 per million tokens (in/out). While cheaper than GPT-5.5 / Opus-class models on a per-token basis, the model tends to think quite a bit and can consume dollars quickly in output tokens.

There is also a geopolitical tailwind. GLM 5.2 landed days after the U.S. export-control directive that forced Anthropic to disable Fable 5 and Mythos 5 broadly in order to prevent foreign-national access. An MIT-weight model with near-frontier coding performance is newly more attractive for organizations that demand continuity.

Caveats: text-only; no image or video input. It is token hungry and can burn money on high thinking models. It is still very new, provider quality will vary. Throughput across providers tops out around ~78 tok/s (vs. DeepSeek V4-Flash at ~84 tok/s).

Reach for it when: Closest drop-in replacement for agentic planning and coding. Excels at architecture planning, repo-scale refactors, or long-running agent tasks. Pick it over DeepSeek when quality and planning matter more than rock-bottom price.

3. MiniMax M3 — The multimodal long-context model

MiniMax M3 is the only model in this group that understands not just text, but image and video natively. If your agent needs to read screenshots, inspect UI states, parse diagrams, or reason over video, M3 is the open-weight lane to test first.

Artificial Analysis has M3 at 44 on its Intelligence Index (v4.1) — tied with DeepSeek V4 Pro and behind GLM 5.2 and Nemotron 3 Ultra, but its differentiation is modality, not raw index rank. More telling is that it lands roughly level with Claude Sonnet 4.6 on GDPval-AA, Artificial Analysis’s real-world agentic benchmark. It is also a real long-context model: ~428B parameters / ~23B active MoE, 1M-token context, and MiniMax Sparse Attention to make million-token inference less absurd.

Realized pricing is attractive at an OpenRouter weighted-average $0.098 / $1.21 per million tokens (in/out) (though prices rise on context lengths above 512k tokens). But as with GLM, cheap tokens do not mean cheap runs. M3 can be verbose, and reasoning-heavy.

The architecture is innovative as well; MSA uses blockwise sparse attention over real K/V blocks rather than a separate retrieval or compression system. The paper reports large attention-compute reductions without major quality loss. In practice, this is the reason M3 can plausibly support whole-repo, long-document, screenshot-heavy, or video-grounded agents at reasonable cost.

Caveats: Not MIT licensed. The weights are out, but under the MiniMax Community License: commercial use requires attribution/notice, and larger commercial products need prior written authorization. Provider quality and full-context support will vary. It is also not the best pure coding model in the group; for text-only agentic coding, GLM 5.2 is likely a better default.

Reach for it when: you need long-context agents with native image or video input. Best fit is UI automation, screenshot-to-code, diagram/document understanding, video-grounded workflows, or mixed-modal repo/document agents. Likely a strong competitor to Gemini Flash models which also excel in multi-modal understanding.

4. NVIDIA Nemotron 3 Ultra — The U.S. open-weight accelerator

NVIDIA’s Nemotron 3 Ultra is the clearest sign that strong open-weight models are not just coming from Chinese labs. It is not the top open model overall, but it is the strongest U.S. open-weight entrant: a serious reasoning model, built for enterprise deployment, with NVIDIA’s hardware and software stack behind it.

The benchmark read is solid. Artificial Analysis has Nemotron 3 Ultra at 48 on its Intelligence Index (v4.1) — second among open-weight models, behind only GLM 5.2 (51) and ahead of MiniMax M3, DeepSeek V4 Pro, and Kimi K2.6, and comfortably the strongest U.S. open-weight entrant. Its differentiation is deployment efficiency. OpenRouter’s weighted-average price is $0.423 / $2.61 per million tokens (in/out), with a separate :free route that is proving extremely popular and driving adoption.

The model is a 550B / 55B-active hybrid Mamba-2 + Transformer MoE, trained with NVFP4, 1M context, Multi-Token Prediction, and an OpenMDW license. NVIDIA also released more than weights: data, recipes, eval tooling, and RL infrastructure. NVIDIA’s incentive is clear: more open-model usage (and a wider array of meaningful models) means more demand for Blackwell/Hopper inference, NIM microservices, CUDA, AI Enterprise, and sovereign AI deployments. Nemotron is a model, but it is also designed to boost the entire open ecosystem, and therefore the NVIDIA AI stack.

Keep an eye on NVIDIA’s work here, as they are incentivized to continue proliferating models across the global ecosystem, and have the deep pockets to fund the research and training to keep up with any lab on the planet.

Caveats: text-only; no image or video input. It trails GLM 5.2 on raw intelligence, and the broader Chinese open frontier leads on peak coding scores. The free route is useful for testing and adoption, not something to build a serious production SLA around. And while the license is permissive, it is OpenMDW, not MIT.

Reach for it when: you want a U.S.-built open-weight model for long-running agents, RAG, orchestration, coding support, or enterprise workflows where speed, deployability, data control, and vendor comfort matter more than absolute benchmark rank. Pick GLM or DeepSeek for peak open coding quality; pick Nemotron when the stack matters.

At a glance

Model	AA Index (v4.1)	Price (in/out per M)	Throughput	Reach for it when
DeepSeek V4 Flash	44 (Pro)	$0.054 / $0.242	~84 tok/s	You want frontier-class agentic coding at the lowest cost on the board — the cost/performance Pareto frontier.
GLM 5.2	51 (top open)	$0.447 / $3.31	~78 tok/s	Planning quality and long-horizon coding matter more than rock-bottom price; the closest open drop-in for Opus-style work.
MiniMax M3	44	$0.098 / $1.21	~59 tok/s	Your agent needs native image or video input over long context — screenshots, UI states, diagrams, documents.
Nemotron 3 Ultra	48	$0.423 / $2.61 (+ `:free`)	~75 tok/s	You want a U.S.-built, fully-open model on the NVIDIA stack and the vendor/deployment story matters as much as the rank.

The throughline: DeepSeek proved an open model could just be your frontier agent — and do it for cents. GLM is the new quality leader. MiniMax owns multimodal-on-a-budget. NVIDIA brings a fully-open, U.S.-built option with the deepest pockets behind it. The gap to the closed frontier is real but narrow, and it has not been widening. Pick the corner of the cost / quality / modality / vendor box that matches your workload — and, as always, the only ground truth is testing it against your own tasks.

All pricing and throughput data taken from OpenRouter.ai in June 2026 (weighted-average across providers); intelligence scores from the Artificial Analysis Intelligence Index v4.1 indexed on June 25, 2026. Model pages: DeepSeek V4 Flash · GLM 5.2 · MiniMax M3 · Nemotron 3 Ultra.