Microsoft launches Mu AI model for smart local tasks on Windows PCs

Microsoft has rolled out a small AI model called Mu that runs locally on Copilot+ PCs. It is designed to give users fast and accurate help by using the device’s Neural Processing Unit instead of relying on cloud servers. Mu already supports the Windows Settings app for users in the Windows Insider Dev Channel. You can type natural language queries like “turn on night light,” and Mu understands and responds in real time.
What makes Mu special?
The model has 330 million parameters and uses a Transformer encoder-decoder setup. This allows it to process input and output separately, which leads to better speed and lower delay. Microsoft claims that Mu generates more than 100 tokens per second and has much lower latency than other models of its size.
Mu was trained using Azure A100 GPUs and tuned with advanced methods such as grouped-query attention and rotary positional embeddings. These help the model work efficiently even on limited hardware.
To ensure Mu runs well on different PCs, Microsoft partnered with Intel, AMD, and Qualcomm. With post-training quantisation, the model was adapted to work with lower-precision formats like 8-bit and 16-bit integers. On devices like the Surface Laptop 7, Mu achieves output speeds over 200 tokens per second. Initially, Microsoft tested a larger model called Phi LoRA, which was accurate but slow. Mu, after fine-tuning, proved to be faster while still meeting the accuracy requirements. The team scaled up Mu’s dataset to include 3.6 million training samples and added support for hundreds of Windows system settings. Features like prompt tuning and noise injection improved the model’s ability to understand real-world queries.
Mu is part of Microsoft’s broader push to bring AI features directly to the device. It builds on earlier research from models like Phi and Phi Silica and will likely play a key role in future AI experiences on Windows PCs.