The Qwen team at Alibaba has officially released Qwen3.6-35B-A3B, a powerful new open-weight model that marks the debut of the Qwen3.6 generation. Released under the Apache 2.0 license, this model utilizes a sparse Mixture of Experts (MoE) architecture to deliver high-level performance while maintaining significant computational efficiency. By activating only 3 billion of its 35 billion total parameters during inference, the model provides agentic coding and multimodal capabilities that compete with much larger dense models.
Architecture and Efficiency
The model’s efficiency stems from its MoE design, which routes input tokens through a small subset of specialized sub-networks rather than engaging the entire parameter set. Specifically, the architecture features 256 total experts, with 8 routed experts and 1 shared expert activated per token. This structure is supported by a unique layout of 40 layers, alternating between Gated DeltaNet for linear attention and Gated Attention sublayers using Grouped Query Attention. This configuration significantly reduces KV-cache memory pressure, allowing for a native context length of 262,144 tokens, which can be extended up to 1,010,000 tokens using YaRN scaling.
Agentic Coding and Reasoning
Qwen3.6-35B-A3B demonstrates exceptional proficiency in software engineering and complex reasoning tasks. On the SWE-bench Verified benchmark, the model achieved a score of 73.4, and it set a new standard on Terminal-Bench 2.0 with a score of 51.5, outperforming several competing models. Its frontend code generation capabilities are particularly notable, reaching a score of 1397 on QwenWebBench. Furthermore, the model shows strong academic reasoning, scoring 92.7 on AIME 2026 and 86.0 on the GPQA Diamond graduate-level scientific benchmark.
Multimodal Performance and Reasoning Control
Beyond text, the model includes a native vision encoder capable of processing images, documents, and video. It outperformed Claude-Sonnet-4.5 and Gemma4-31B across several benchmarks, including an 81.7 score on MMMU, 85.3 on RealWorldQA, and 83.7 on VideoMMMU. To manage these capabilities, the model introduces explicit control over reasoning behavior. While it operates in a thinking mode by default, developers can disable this via API parameters. A new Thinking Preservation feature also allows the model to retain and leverage reasoning traces from historical messages, enhancing consistency and efficiency in multi-step agent workflows.
The model is compatible with major inference frameworks, including SGLang, vLLM, KTransformers, and Hugging Face Transformers.

Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!