Microsoft AI Launches Three Multimodal Models to Rival OpenAI

Key Takeaways

  • Microsoft is diversifying its AI strategy by launching proprietary models that compete directly with its partner, OpenAI.
  • The new MAI suite offers a cost-effective, high-performance alternative for developers and enterprises seeking multimodal capabilities.
  • Mustafa Suleyman’s 'Humanist AI' approach signals a shift toward practical, communication-focused model development.

Microsoft AI, the research division led by CEO Mustafa Suleyman, has announced the release of three new foundational AI models capable of generating text, voice, and images. The launch marks a significant step in Microsoft’s strategy to build its own stack of multimodal models and compete directly with rival AI labs, even as the company maintains its long-standing partnership with OpenAI.

New Multimodal Capabilities

The three models, developed by the MAI Superintelligence team, are designed to handle distinct media tasks. MAI-Transcribe-1 is capable of transcribing speech into text across 25 languages and is reported to be 2.5 times faster than Microsoft’s existing Azure Fast offering. MAI-Voice-1 serves as an audio-generating model that allows users to create custom voices and generate 60 seconds of audio in just one second. Finally, MAI-Image-2, which functions as a video-generating model, was initially released on the MAI Playground testing platform on March 19.
All three models are now available on Microsoft Foundry. Additionally, the transcription and voice models have been integrated into the MAI Playground.

Strategic Positioning and Pricing

In a competitive landscape, Microsoft AI is positioning these models as a cost-effective alternative to offerings from Google and OpenAI. Pricing for the new suite is set at $0.36 per hour for MAI-Transcribe-1, $22 per 1 million characters for MAI-Voice-1, and a tiered structure for MAI-Image-2, which costs $5 per 1 million tokens for text input and $33 per 1 million tokens for image output.
Mustafa Suleyman described the company’s approach as "Humanist AI," emphasizing a focus on human-centric communication and practical utility. Suleyman noted that the company plans to release further models in the future through Foundry and integrate them directly into Microsoft products and experiences.

Relationship with OpenAI

Despite the development of its own proprietary models, Microsoft continues to uphold its partnership with OpenAI, having invested more than $13 billion into the research lab. According to Suleyman, a recent renegotiation of the partnership agreement provided the necessary flexibility for Microsoft to pursue its own superintelligence research. This dual approach mirrors Microsoft’s strategy regarding hardware, where the company both produces its own chips and sources them from external players.

Comments (0)

No comments yet

Be the first to share your thoughts!