Microsoft · The Register

Microsoft shivs OpenAI with new AI models for speech, images

Thu, Apr 2 · 8:07 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Image accompanies the article at The Register. No description was extracted from the source.

Microsoft on Thursday unveiled public preview versions of three home-baked machine learning models focused on speech recognition, speech synthesis, and image generation.

Key facts

Microsoft is already consuming its own dog food here – Copilot's Audio Expressions runs on MAI-Voice-1 while Copilot's Voice Mode transcription service uses MAI-Transcribe-1
The AI hype-leader is burning cash and is expected to lose $14 billion this year, according to internal projections published by The Information
Naomi Moneypenny, who leads the Microsoft Azure AI Foundry Models product team, talked up the model arrivals in a blog post
When Microsoft announced that it had renegotiated its agreement with OpenAI, the Windows biz indicated that the partnership would continue at least to 2032 – a scenario that assumes no AI market

Summary

The release makes the Windows biz look more like a direct competitor to OpenAI than an investor – Redmond held an OpenAI stake valued at about $135 billion as of last October. The models include: MAI-Transcribe-1, a speech recognition model that delivers "enterprise-grade accuracy across 25 languages at approximately 50 percent lower GPU cost than leading alternatives"; MAI-Voice-1, a speech generation model that can supposedly produce 60 seconds of audio in less than a second on a single GPU; and MAI-Image-2, a text-to-image model, to compound the despair of digital artists. OpenAI happens to offer its own speech recognition, speech generation, and text-to-image models. Microsoft's models are available through Foundry (formerly Azure AI Studio), a platform to develop AI agents and applications.

Read full article at The Register →

#microsoft #openai