Bulbul V3 is the latest and most sophisticated text-to-speech (TTS) model issued by Sarvam AI so far, which was presented on February 5, 2026, and provides hyper-realistic expressive audio, specialized to the language diversity of India.
It can process code-mixing, regional accents, and multifaceted prosody, which is engineered to operate in production settings, and achieves new standards in blind human assessment.
When and where was Bulbul V3 released?
Bulbul V3 was released on February 5, 2026, and was the first event of Sarvam AI's ambitious 14-day blitz of AI products in the runup to the India-AI Impact Summit (February 16-20, 2026).
It is based on Bulbul v2 (May 2025), which added 11 Indian languages, as Sarvam gained impetus in the Rs 10,300-crore IndiaAI Mission.
Who are the Innovators of Bulbul V3? What is the Mother Company?
A startup called Sarvam AI, based in Bengaluru was founded in 2023 and one of 12 projects chosen as part of the IndiaAI Mission sovereign LLM program.
Josh Talks, which used a great deal of blind listening tests to prove its superiority, independently validated it. Indic-first AI was chosen by Sarvam because one of its founders, Vivek Raghavan and Pratyush Kumar, are professionals in open-source language technology.
What does Bulbul V3 focus on training?
The model was trained on large datasets that include realistic patterns of Indian speech with particular focus on Hinglish code-mixing, regional dialects, and numerical readings (e.g., phone numbers, dates), proper nouns, abbreviations, technical jargon in STEM and different emotional orientations.
Strict fine-tuning was used to achieve stability of production and reduce hallucinations, skips or unnatural artifacts on long-form or noisy inputs.
Which library voice is provided by Bulbul V3?
Bulbul V3 has a library of more than 30-35 quality voices, and is voiced by professional Indian artists, and includes 11 languages of Hindi, Tamil, Telugu, Kannada, Malayalam, Bengali, Marathi, Gujarati, Punjabi, Odia, and Assamese (with an aggressive expansion to all 22 planned languages).
One of the categories of voices applied to industries such as edtech (involving tutors), BFSI (authoritative advisors), healthcare (empathetic tones), and entertainment (expressive characters).
Check Out| What is Grok 3? Elon Musk’s ‘Smartest AI on Earth’ To Be Released Today! Check Details Here
Validity of Bulbul V3: What are human listening tests?
Bulbul V3 offered the best results in 8kHz telephony and second-place results in general 48kHz full-band audio, only beaten by ElevenLabs v3 alpha, but significantly outperforming Cartesia Sonic-3, Azure TTS, Google TTS, and AWS Polly in blind A/B tests (blind A/B tests 20,000, 2019).
It scored the highest in such important measures: least number of words missed, mispronunciations, and unnatural pauses.
Bulbul V3: What is the usage?
Free API access is available without any restrictions and access to the Sarvam Developer Dashboard until February 28, 2026.
There is a no-code playground where one can test and iterate in real-time. Scalable pricing is available after the promo; come to the Sarvam Discord to get community support and pre-release.
Bulbul V3 can be used to power immersive voice agents in gig economy onboarding (e.g. multilingual task briefs), banking collections (natural Hinglish reminders) and edtech tutors (engaging story narration, healthcare reminders, empathy in accent), and AI-driven video dubbing/games (character voices).
Its stability is appropriate in 24/7 contact centers that take millions of calls.
Comments
All Comments (0)
Join the conversation