GLM-5.1 — Released April 7, 2026

GLM
General Language Model

From autoregressive blank-infilling to agentic engineering — the GLM family has evolved from a novel pretraining architecture into one of the world's top open-weight AI systems.

754B
Total Params (MoE)
200K
Context Window
#1
Open Model (LMArena)
92.7
AIME 2026
NEW — April 7, 2026

GLM-5.1

"Towards Long-Horizon Tasks" — GLM-5.1 is a next-generation flagship for agentic engineering. It achieves SOTA on SWE-Bench Pro and sustains productive optimization over hundreds of iterations and thousands of tool calls.

754B
total parameters
~40B active per token (MoE)
256
MoE experts
80 layers
28.5T
training tokens
3-stage context extension
200K
max context
up from 128K in GLM-4.5

Architecture Innovations

Multi-Latent Attention (MLA)

Reduces KV cache vectors for significant GPU memory savings, enabling longer contexts and more efficient inference.

DeepSeek Sparse Attention

Reduces attention computation by 1.5–2× for long sequences without quality loss.

Multi-Token Prediction

Predicts next 2 tokens simultaneously during inference with acceptance length of 2.76 — faster decoding.

The Longer It Runs, the Better It Gets

Unlike previous models that plateau early, GLM-5.1 sustains optimization over 600+ iterations and 6,000+ tool calls. In a vector search optimization task, it reached 21.5k QPS — 6× the best single-session result. It built a complete Linux desktop environment in a single 8-hour run.

Model Lineup

A Model for Every Need

From lightweight inference to flagship performance, the GLM family covers the full spectrum.

🧠

GLM-5 / 5.1

Latest

754B MoE (40B active). #1 open model on LMArena. 200K context. Competitive with Claude Opus 4.5, Gemini 3 Pro, GPT-5.2.

754B MoE 200K ctx Agentic Open Weights

GLM-4

Flagship v4

GPT-4 competitive. 128K context, strong bilingual performance, native tool use and function calling.

128K ctx Tool Use Bilingual
👁️

GLM-4V

Multimodal

Vision-capable variant that understands images alongside text. Built on CogVLM research.

Image Input CogVLM 128K ctx
💨

GLM-4-Air

Fast

Lighter and faster variant optimized for cost-sensitive production use without sacrificing quality.

Low Latency Cost Efficient API
🔓

GLM-4-9B

Open Source

9B model that outperforms Llama-3-8B. 1M context variant available. Permissive commercial license.

9B Params Open Weights 1M ctx variant
💻

CodeGeeX4

Code

Specialized code model based on GLM-4-9B. Strong HumanEval scores with a VS Code extension.

Code Gen VS Code HumanEval
What It Can Do

Key Capabilities

🤖

Agentic Engineering

GLM-5 autonomously solves multi-step engineering tasks across 10,000+ SWE environments in 9 languages. Supports 1,000+ concurrent agentic rollouts via the async "slime" RL framework.

🇨🇳

Best-in-Class Bilingual

Consistently the top-performing model family for Chinese NLP tasks while maintaining strong English performance. Dominates C-Eval and CMMLU benchmarks.

🖼️

Multimodal Understanding

GLM-4V and CogVLM support image understanding, visual question answering, and text generation from images.

🔧

Native Tool Use & Search

GLM-5's search agent features hierarchical context management and a web knowledge graph with 2M+ pages. GLM-4 All-Tools autonomously invokes browsing, code interpreters, and drawing tools.

📏

Massive Context Window

200K tokens in GLM-5 (up from 128K). The GLM-4-9B-Chat-1M variant pushes to 1 million tokens. 3-stage training: 32K → 128K → 200K.

⌨️

Powerful Code Generation

CodeGeeX4 delivers strong HumanEval results with a VS Code extension. GLM-5 scores 77.8 on SWE-bench Verified and 56.2 on Terminal-Bench 2.0.

💭

3 Thinking Modes

GLM-5 supports interleaved, preserved, and turn-level thinking control — giving developers fine-grained control over reasoning depth and cost.

🎬

Slide & Media Generation

GLM-5 generates presentation slides via multi-level reward training for layout, rendering, and visual quality. CogView and CogVideoX handle image and video generation.

Under the Hood

Unique Architecture

GLM originated with autoregressive blank infilling — combining the best of GPT and BERT. GLM-5 builds on that foundation with a sparse MoE architecture and cutting-edge attention mechanisms.

➡️

GPT-Style

Causal / autoregressive LM. Predicts the next token. Great for generation, unidirectional.

↔️

BERT-Style

Masked LM. Predicts [MASK] tokens with bidirectional context. Great for understanding, not generation.

🔄

GLM-Style

Autoregressive blank infilling. Removes spans of text and autoregressively predicts them with both left and right context.

GLM-5 Training Pipeline

28.5T tokens
Base Pretraining
SFT
Supervised Fine-Tuning
Reasoning RL
GRPO + IcePop
Agentic RL
1000+ concurrent
GLM-5
Distilled
Performance

Benchmark Highlights

First open-weights model to score 50 on the Artificial Analysis Intelligence Index v4.0. Competitive with Claude Opus 4.5, Gemini 3 Pro, and GPT-5.2.

GLM-5.1 Benchmarks

Benchmark What It Tests GLM-5.1 GLM-5
AIME 2026 Advanced math competition 95.3 95.4
GPQA-Diamond Graduate-level reasoning 86.2 86.0
HLE Humanity's Last Exam 31.0 (52.3 w/ tools) 30.5 (50.4)
SWE-Bench Pro Complex software engineering 58.4 55.1
NL2Repo Repo generation 42.7 35.9
Terminal-Bench 2.0 Real-world terminal tasks 69.0 56.2
CyberGym Cybersecurity tasks 68.7 48.3
BrowseComp Web browsing accuracy 68.0 (79.3 w/ ctx mgmt) 62.0 (75.9)
τ³-Bench Agent task completion 70.6 69.2

Competitive Landscape

Benchmark What It Tests GLM-5.1 Opus 4.6 GPT-5.4
SWE-Bench Pro Complex software engineering 58.4 57.3 57.7
NL2Repo Repo generation 42.7 49.8 41.3
Terminal-Bench 2.0 Real-world terminal tasks 63.5 65.4 75.1
AIME 2026 Advanced math 95.3 95.6 98.7
GPQA-Diamond Graduate-level reasoning 86.2 91.3 92

GLM-5.1 benchmarks from z.ai/blog/glm-5.1 (Apr 2026). Competitive landscape scores from the same report.

History

The GLM Family Timeline

Open Source

Built in the Open

THUDM releases models, training code, and research openly. The entire GLM ecosystem is available on GitHub and HuggingFace.

🏗️

7 Chinese Chip Platforms Supported

GLM-5 is fully adapted to Huawei Ascend, Moore Threads, Hygon, Cambricon, Kunlunxin, MetaX, and Enflame — ensuring broad hardware accessibility.