Microsoft · The Register
Microsoft shivs OpenAI with new AI models for speech, images
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
Microsoft on Thursday unveiled public preview versions of three home-baked machine learning models focused on speech recognition, speech synthesis, and image generation.
Key facts
- Microsoft is already consuming its own dog food here – Copilot's Audio Expressions runs on MAI-Voice-1 while Copilot's Voice Mode transcription service uses MAI-Transcribe-1
- The AI hype-leader is burning cash and is expected to lose $14 billion this year, according to internal projections published by The Information
- Naomi Moneypenny, who leads the Microsoft Azure AI Foundry Models product team, talked up the model arrivals in a blog post
- When Microsoft announced that it had renegotiated its agreement with OpenAI, the Windows biz indicated that the partnership would continue at least to 2032 – a scenario that assumes no AI market
Summary
The release makes the Windows biz look more like a direct competitor to OpenAI than an investor – Redmond held an OpenAI stake valued at about $135 billion as of last October. The models include: MAI-Transcribe-1, a speech recognition model that delivers "enterprise-grade accuracy across 25 languages at approximately 50 percent lower GPU cost than leading alternatives"; MAI-Voice-1, a speech generation model that can supposedly produce 60 seconds of audio in less than a second on a single GPU; and MAI-Image-2, a text-to-image model, to compound the despair of digital artists. OpenAI happens to offer its own speech recognition, speech generation, and text-to-image models. Microsoft's models are available through Foundry (formerly Azure AI Studio), a platform to develop AI agents and applications.