Alibaba’s Qwen team has officially released Qwen3.7-Plus, a new multimodal large language model now available to international developers through the Bailian platform, which is accessed globally as Model Studio. This release follows the debut of the Qwen3.7 generation in May and marks a significant shift toward multimodal hybrid agent technology, designed to handle long-running, complex tasks rather than simple text responses.
Multimodal Capabilities and Vision Benchmarks
Qwen3.7-Plus is engineered to process both images and video alongside written prompts. It is important to note that the model is strictly for visual understanding; it reads and interprets media but does not generate images or video, as those functions are handled by separate model families within Alibaba’s ecosystem.
The model has already demonstrated its performance in the Vision Arena, a neutral leaderboard managed by LM Arena. In its preview phase, Qwen3.7-Plus secured the #16 overall ranking, establishing Alibaba as the #5 lab in the field of vision. This performance is particularly relevant for tasks requiring high-scale optical character recognition (OCR), chart reading, and video-frame analysis.
Agentic Features and Autonomous Iteration
The core of the Qwen3.7-Plus release is its focus on agentic workflows. Beyond simple input processing, the model integrates five specific capabilities: deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration. Through self-programming, the model can write and revise its own code, while tool invocation allows it to interact with external functions or APIs. The autonomous iteration feature enables the model to loop through tasks until they are completed, supported by verification and testing to check the validity of its outputs.
To support these agentic functions, the Bailian platform provides an Agentic reinforcement learning mechanism. This system utilizes real-world execution feedback to refine the model's accuracy over time. Additionally, the platform includes built-in safety guardrails designed to keep autonomous tools within preset operational limits, a necessary feature when an agent is tasked with editing files or executing commands.
Strategic Positioning
Qwen3.7-Plus serves as the multimodal counterpart to the text-only Qwen3.7-Max. While the Plus model focuses on visual and agentic tasks, the Max sibling provides the foundation for the generation's reasoning capabilities. Upon its release, Qwen3.7-Max achieved a score of 56.6 on the Artificial Analysis Intelligence Index, marking the highest placement for a Chinese model at that time. By offering these models through the Bailian platform, Alibaba provides developers with an API-based backend capable of managing workloads that require a combination of visual input, deep reasoning, and autonomous tool usage.

Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!