GLM — General Language Model by THUDM & Zhipu AI

NEW — April 7, 2026

GLM-5.1

"Towards Long-Horizon Tasks" — GLM-5.1 is a next-generation flagship for agentic engineering. It achieves SOTA on SWE-Bench Pro and sustains productive optimization over hundreds of iterations and thousands of tool calls.

754B

total parameters

~40B active per token (MoE)

256

MoE experts

80 layers

28.5T

training tokens

3-stage context extension

200K

max context

up from 128K in GLM-4.5

Architecture Innovations

Multi-Latent Attention (MLA)

Reduces KV cache vectors for significant GPU memory savings, enabling longer contexts and more efficient inference.

DeepSeek Sparse Attention

Reduces attention computation by 1.5–2× for long sequences without quality loss.

Multi-Token Prediction

Predicts next 2 tokens simultaneously during inference with acceptance length of 2.76 — faster decoding.

⏳

The Longer It Runs, the Better It Gets

Unlike previous models that plateau early, GLM-5.1 sustains optimization over 600+ iterations and 6,000+ tool calls. In a vector search optimization task, it reached 21.5k QPS — 6× the best single-session result. It built a complete Linux desktop environment in a single 8-hour run.

Model Lineup

A Model for Every Need

From lightweight inference to flagship performance, the GLM family covers the full spectrum.

🧠

GLM-5 / 5.1

Latest

754B MoE (40B active). #1 open model on LMArena. 200K context. Competitive with Claude Opus 4.5, Gemini 3 Pro, GPT-5.2.

754B MoE 200K ctx Agentic Open Weights

⚡

GLM-4

Flagship v4

GPT-4 competitive. 128K context, strong bilingual performance, native tool use and function calling.

128K ctx Tool Use Bilingual

👁️

GLM-4V

Multimodal

Vision-capable variant that understands images alongside text. Built on CogVLM research.

Image Input CogVLM 128K ctx

💨

GLM-4-Air

Fast

Lighter and faster variant optimized for cost-sensitive production use without sacrificing quality.

Low Latency Cost Efficient API

🔓

GLM-4-9B

Open Source

9B model that outperforms Llama-3-8B. 1M context variant available. Permissive commercial license.

9B Params Open Weights 1M ctx variant

💻

CodeGeeX4

Code

Specialized code model based on GLM-4-9B. Strong HumanEval scores with a VS Code extension.

Code Gen VS Code HumanEval

What It Can Do

Key Capabilities

🤖

Agentic Engineering

GLM-5 autonomously solves multi-step engineering tasks across 10,000+ SWE environments in 9 languages. Supports 1,000+ concurrent agentic rollouts via the async "slime" RL framework.

🇨🇳

Best-in-Class Bilingual

Consistently the top-performing model family for Chinese NLP tasks while maintaining strong English performance. Dominates C-Eval and CMMLU benchmarks.

🖼️

Multimodal Understanding

GLM-4V and CogVLM support image understanding, visual question answering, and text generation from images.

🔧

Native Tool Use & Search

GLM-5's search agent features hierarchical context management and a web knowledge graph with 2M+ pages. GLM-4 All-Tools autonomously invokes browsing, code interpreters, and drawing tools.

📏

Massive Context Window

200K tokens in GLM-5 (up from 128K). The GLM-4-9B-Chat-1M variant pushes to 1 million tokens. 3-stage training: 32K → 128K → 200K.

⌨️

Powerful Code Generation

CodeGeeX4 delivers strong HumanEval results with a VS Code extension. GLM-5 scores 77.8 on SWE-bench Verified and 56.2 on Terminal-Bench 2.0.

💭

3 Thinking Modes

GLM-5 supports interleaved, preserved, and turn-level thinking control — giving developers fine-grained control over reasoning depth and cost.

🎬

Slide & Media Generation

GLM-5 generates presentation slides via multi-level reward training for layout, rendering, and visual quality. CogView and CogVideoX handle image and video generation.

Under the Hood

Unique Architecture

GLM originated with autoregressive blank infilling — combining the best of GPT and BERT. GLM-5 builds on that foundation with a sparse MoE architecture and cutting-edge attention mechanisms.

➡️

GPT-Style

Causal / autoregressive LM. Predicts the next token. Great for generation, unidirectional.

↔️

BERT-Style

Masked LM. Predicts [MASK] tokens with bidirectional context. Great for understanding, not generation.

🔄

GLM-Style

Autoregressive blank infilling. Removes spans of text and autoregressively predicts them with both left and right context.

GLM-5 Training Pipeline

28.5T tokens
Base Pretraining

→

SFT
Supervised Fine-Tuning

→

Reasoning RL
GRPO + IcePop

→

Agentic RL
1000+ concurrent

→

GLM-5
Distilled

Performance

Benchmark Highlights

First open-weights model to score 50 on the Artificial Analysis Intelligence Index v4.0. Competitive with Claude Opus 4.5, Gemini 3 Pro, and GPT-5.2.

GLM-5.1 Benchmarks

Benchmark	What It Tests	GLM-5.1	GLM-5
AIME 2026	Advanced math competition	95.3	95.4
GPQA-Diamond	Graduate-level reasoning	86.2	86.0
HLE	Humanity's Last Exam	31.0 (52.3 w/ tools)	30.5 (50.4)
SWE-Bench Pro	Complex software engineering	58.4	55.1
NL2Repo	Repo generation	42.7	35.9
Terminal-Bench 2.0	Real-world terminal tasks	69.0	56.2
CyberGym	Cybersecurity tasks	68.7	48.3
BrowseComp	Web browsing accuracy	68.0 (79.3 w/ ctx mgmt)	62.0 (75.9)
τ³-Bench	Agent task completion	70.6	69.2

Competitive Landscape

Benchmark	What It Tests	GLM-5.1	Opus 4.6	GPT-5.4
SWE-Bench Pro	Complex software engineering	58.4	57.3	57.7
NL2Repo	Repo generation	42.7	49.8	41.3
Terminal-Bench 2.0	Real-world terminal tasks	63.5	65.4	75.1
AIME 2026	Advanced math	95.3	95.6	98.7
GPQA-Diamond	Graduate-level reasoning	86.2	91.3	92

GLM-5.1 benchmarks from z.ai/blog/glm-5.1 (Apr 2026). Competitive landscape scores from the same report.

History

The GLM Family Timeline

🏛️

GLM-130B — 2022

130B bilingual model. One of the first large open bilingual models.
💬

ChatGLM-6B — Mar 2023

First open chatbot model. Hugely popular in China.
⚡

ChatGLM2-6B — Jun 2023

32K context, FlashAttention. Better inference speed.
🔧

ChatGLM3-6B — Oct 2023

Added function calling and code interpreter.
🏆

GLM-4 — Jan 2024

Flagship. 128K context, multimodal, tool use, GPT-4 competitive.
🔓

GLM-4-9B — Jun 2024

Open-source 9B. Outperforms Llama-3-8B. 1M context variant.
🧠

GLM-5 — Feb 2026

754B MoE, 200K context, #1 open model on LMArena. Agentic engineering, 3 thinking modes, search agent.
🆕

GLM-5.1 — Apr 2026

Next-gen flagship for long-horizon agentic tasks. SOTA on SWE-Bench Pro (58.4), CyberGym (68.7). MIT License. 21.5k QPS on vector search over 600+ iterations.

Open Source

Built in the Open

THUDM releases models, training code, and research openly. The entire GLM ecosystem is available on GitHub and HuggingFace.

🧠

GLM-5.1

754B open-weights on HuggingFace. MIT License. Full-precision and FP8. Compatible with vLLM and SGLang.

HuggingFace →

📦

GLM-4-9B

Open weights on HuggingFace & ModelScope. Permissive commercial license.

GitHub →

💻

CodeGeeX4

Code-specialized model with VS Code extension. Strong HumanEval performance.

GitHub →

🏗️

7 Chinese Chip Platforms Supported

GLM-5 is fully adapted to Huawei Ascend, Moore Threads, Hygon, Cambricon, Kunlunxin, MetaX, and Enflame — ensuring broad hardware accessibility.

GLM General Language Model

GLM-5.1

Architecture Innovations

Multi-Latent Attention (MLA)

DeepSeek Sparse Attention

Multi-Token Prediction

The Longer It Runs, the Better It Gets

A Model for Every Need

GLM-5 / 5.1

GLM-4

GLM-4V

GLM-4-Air

GLM-4-9B

CodeGeeX4

Key Capabilities

Agentic Engineering

Best-in-Class Bilingual

Multimodal Understanding

Native Tool Use & Search

Massive Context Window

Powerful Code Generation

3 Thinking Modes

Slide & Media Generation

Unique Architecture

GPT-Style

BERT-Style

GLM-Style

GLM-5 Training Pipeline

Benchmark Highlights

GLM-5.1 Benchmarks

Competitive Landscape

The GLM Family Timeline

GLM-130B — 2022

ChatGLM-6B — Mar 2023

ChatGLM2-6B — Jun 2023

ChatGLM3-6B — Oct 2023

GLM-4 — Jan 2024

GLM-4-9B — Jun 2024

GLM-5 — Feb 2026

GLM-5.1 — Apr 2026

Built in the Open

GLM-5.1

GLM-4-9B

CodeGeeX4

7 Chinese Chip Platforms Supported

GLM
General Language Model