Microsoft Open-Sources VibeVoice: A Frontier Voice AI Model with ASR, TTS, and Voice Cloning

Microsoft Open-Sources VibeVoice: A Frontier Voice AI Model with ASR, TTS, and Voice Cloning

Microsoft recently open-sourced VibeVoice on GitHub, releasing its voice AI technology stack under an open license. By the end of April, the project had reached 45,709 stars and over 5,100 forks, making it one of the most active voice AI open source projects on GitHub.

VibeVoice is not a single model but a complete toolchain covering automatic speech recognition (ASR), text-to-speech (TTS), and voice cloning. The project directory is well-structured: vibevoice/ contains core model code, demo/ provides a Gradio interactive interface, finetuning-asr/ supports custom ASR fine-tuning, and vllm_plugin/ implements integration with the vLLM inference engine.

In terms of commit activity, the project has seen multiple substantive updates in the past two weeks: the ASR demo added MPS/Apple Silicon support, the vLLM plugin fixed an audio duration validation OOM issue, and documentation and contribution guides continue to be improved. By the end of April, the project had 134 commits and 796 historical commits (across different branches).

Notably, VibeVoice takes a practical engineering approach. The addition of the vLLM plugin means it can plug into existing large model inference infrastructure, lowering deployment barriers. Apple Silicon support allows Mac users to run the ASR demo locally without relying on GPU servers.

For developers needing voice capabilities, VibeVoice is worth attention for its completeness — most open source voice projects focus on either ASR or TTS, while VibeVoice attempts to cover the full pipeline. However, as a newly open-sourced project, its community ecosystem and documentation maturity still need time to mature. We recommend running the demo first before evaluating it for production use.

Main sources: