Ling

Ling is a general-purpose large language model series independently developed and open-sourced by Ant Group. Built on the MoE (Mixture of Experts) architecture, it has been validated on domestic heterogeneous computing platforms, achieved breakthrough at the trillion-parameter scale, and continues to evolve with long context modeling and AI Agent collaborative reasoning capabilities.

Why Choose Ling?

The Ling series achieves key breakthroughs in three dimensions: inference efficiency, long context modeling, and training technology:

Inference Efficiency

Traditional Dense models require activating all parameters during inference, with computational overhead proportional to model size. Ling’s MoE architecture routes each token to activate only the most relevant expert subnetworks, compressing the actual computation to a minimal fraction while maintaining large-scale model capacity.

Long Context Support

Building on MoE’s efficient inference, the Ling 2.6 series further advances long-sequence modeling.

Ling-2.6-1T: Supports up to 1M context window natively; the official API currently exposes a 256K context window.
Ling-2.6-flash: Native 256K context window, capable of processing approximately 200,000 characters of long-form input.
Long-range information retrieval without noticeable degradation: the model reliably retrieves information regardless of whether it appears at the beginning, middle, or end of the context.

Typical Use Cases: Legal contract review, academic literature synthesis, large codebase comprehension, multi-turn long-form dialogue.

Training Technology: FP8 End-to-End Training

To support efficient iteration of 1T parameter models, Ling pioneered FP8 end-to-end mixed-precision training at 1T parameter scale in the open-source community:

30%–40% training throughput improvement compared to BF16 baseline.
Significantly reduced GPU memory usage, enabling larger batch sizes and higher training parallelism.
Numerical stability meets production-grade standards, with loss curves consistent with BF16 training.

Model Selection Guide

The following is a capability comparison of 3 models in the Ling series to help you evaluate and select:

Model	Positioning	Context	Tool Calling	API Access
Ling-2.6-1T	Latest flagship model, supports 1M context length, delivers the full chain from logical reasoning to task execution with minimal compute overhead (1/4 token cost)	1M	✓	✓
Ling-2.6-flash	Next-generation cost-effective model, 7.4B activated parameters, outperforms same-scale Dense models	256K	✓	✓

Selection Suggestions:

For most general-purpose scenarios, start with Ling-2.6-flash — its MoE sparse activation mechanism delivers strong reasoning capability while significantly reducing inference compute overhead.
Need to handle ultra-long documents, complex multi-hop reasoning, or Agent task chains? Choose Ling-2.6-1T, which leverages the innovative MLA and Linear Attention Hybrid architecture to bypass cumbersome “slow thinking” and achieve efficient inference through a “fast thinking” mechanism, reaching results with minimal token overhead and dramatically reducing output cost.
For online service scenarios sensitive to inference latency and throughput, Ling-2.6-flash’s low activated parameter count gives it superior TTFT (Time to First Token) and TPS (Tokens Per Second) performance.

Model Details

Next, let’s dive into the technical characteristics and Use Cases of each model:

Ling-2.6-1T

Ling-2.6-1T is the latest-generation flagship model in the Ling series. It uses a Hybrid architecture combining MLA and Linear Attention, with approximately 1T total parameters and 63B activated parameters, supports a 1M ultra-long context, and delivers a strong balance between flagship-level intelligence and token efficiency.

Advantages:

Fast thinking replaces lengthy reasoning chains, preserving 1T parameter level capability at lower token cost.
Stronger coding and Agent capabilities, achieving open-source SOTA on execution-focused benchmarks such as AIME26 and SWE-bench Verified.
Highly compatible with mainstream Agent frameworks including Claude Code, OpenCode, and OpenClaw, covering multi-tool, multi-step, and multi-constraint scenarios.

Use Cases:

Multi-step tasks and agent tasks
Code completion, project delivery, and bug fixing
Data visualization projects such as slides and reports
Long-context knowledge management and automated workflows

Ling-2.6-flash

Ling-2.6-flash is the latest cost-effective model in the Ling series, built on MoE architecture with 104B total parameters and 7.4B activated parameters, achieving the optimal balance between inference performance and compute cost.

Advantages:

Aggregate benchmark performance comparable to or exceeding 40B-class Dense models.
Low activated parameter count yields superior inference throughput, ideal for high-concurrency online services.
Supports 256K ultra-long context with full tool calling and Agent collaboration capabilities.

Use Cases:

Intelligent customer service and multi-turn dialogue systems
Content generation and copywriting
Real-time translation and text processing
Semantic understanding modules in recommendation systems

Ling’s Evolution

Ling has undergone a complete technical evolution from computing platform validation to Agent capability breakthroughs:

Time	Version	Key Technical Breakthroughs
2025.03	Ling 1.0 Series	Validated the engineering feasibility of MoE large language models on non-high-end heterogeneous computing platforms (non-A100/H100), completing domestic computing platform adaptation
2025.10	Ling-1T (2.0 Series)	First breakthrough at trillion-parameter scale, introduced FP8 end-to-end training, significantly improving training efficiency and achieving cross-domain generalization
2026.02	Ling 2.5 Series	Achieved high-throughput decoding optimization, long context understanding capability reached industry-leading levels, initial Agent interaction and tool calling foundation
2026.04	Ling 2.6 Series	Completed the full reasoning chain from logical reasoning to task execution, significantly improved “intelligence-efficiency ratio” (task completion per unit compute), opening the era of cost-effective, scalable Agent applications

Evolution Path

Ling’s iterations are not just linear parameter scale increases — they represent step-level leaps in core capability dimensions:

Ling 1.0 — Engineering validation: proved that MoE large models can achieve high-quality training on domestic computing platforms.
Ling 2.0 — Efficiency breakthrough: FP8 training system established, single-device training efficiency significantly improved.
Ling 2.5 — Context modeling breakthrough: from short text understanding to end-to-end modeling of ultra-long documents.
Ling 2.6 — Agent capability breakthrough: from single-turn Q&A to low-cost, scalable autonomous execution of complex tasks.

Technology Ecosystem

Based on the Ling foundation model, we have built a complete technology ecosystem covering the full pipeline from training to deployment:

High-Performance Operator Library: Open-sourced high-performance training and inference operator system, covering core components such as MoE routing and attention computation, supporting full-pipeline optimization from pre-training to online inference.
Vertical Domain Models: Domain-adapted models for specialized scenarios such as healthcare and finance, with superior performance on domain knowledge-intensive tasks.
Open Source Community: All research results open-sourced to the Inclusion AI community, continuously co-building the ecosystem with developers.

Quick Start

Create API Key: Obtain access credentials.
Make Your First Call: Complete your first API request in 5 minutes.
Explore More Capabilities: Learn more about advanced features such as inference optimization and long context.

Was this page helpful?