ByteDance has introduced UI-TARS, an AI agent capable of controlling computers and executing complex workflows by understanding graphical user interfaces. Trained on a massive dataset, UI-T…
ByteDance has introduced UI-TARS, an AI agent capable of controlling computers and executing complex workflows by understanding graphical user interfaces. Trained on a massive dataset, UI-TARS outperforms competitors like GPT-4o and Claude across various GUI benchmarks, demonstrating superior perception and comprehension in web and mobile environments.
The agent utilizes multimodal inputs and step-by-step reasoning, explaining its actions while navigating applications and completing tasks such as booking flights or installing software extensions. UI-TARS employs both short-term and long-term memory, along with error correction and post-reflection data, enabling it to learn from mistakes and adapt to unforeseen situations.
This new agent showcases advanced capabilities in autonomous learning and real-world interaction, marking a significant advancement in AI agent technology.