Newsletter
📨 All Issues ← Issue #3
Guides
🧾 CAs & Accountants 🎬 Content Creators ⚡ Productivity Subscribe Free
Issue #4  ·  The Deep Cut 2 May 2026 4 min read

528 million Indians speak Hindi.
ChatGPT learned it as an afterthought.

India has the largest non-English speaking internet population on Earth. Yet every major AI was trained on data that is 96% English. Three Indian teams are racing to fix this. One is government-funded. One is Ola’s Bhavish Aggarwal. One is a startup nobody was watching — until they weren’t.

528M
Hindi speakers in India (2011 Census) — yet AI was built for a world that writes in English
— The Story

The AI that doesn’t think in your language

Here is something that should bother you more than it does. When you type in Hindi to ChatGPT, it is not thinking in Hindi. It is translating your words into English, processing them, and translating back. The same way a schoolteacher used to mark you wrong for writing a Hindi essay with English sentence structure — the model sees your language through a foreign lens.

This is not a minor technical footnote. It means that every AI tool you are using right now — ChatGPT, Claude, Gemini — was trained on a version of the world where Indian languages barely exist. English accounts for roughly half of all content on the world’s most visited websites. All Indian languages combined represent under 1% of that pool. Hindi, Tamil, Telugu, Kannada, Bengali — all of them, together, less than one percent of what the models learned from. Yet 57% of Indian internet users prefer to access the internet in Indian languages. The gap between what AI learned and what India needs is not a minor inefficiency. It is a structural failure.

“We have the world’s second-largest internet population. We produce less than 4% of its content. AI learned from what exists — not what we need.”

The gap between what AI trained on and what India actually needs

The consequence is real. Indian professionals who prompt in Hindi get answers that are technically accurate but culturally off. The idioms are wrong. The references are American. The suggestions assume infrastructure that doesn’t exist in most of India. A lawyer in Lucknow asking an AI to draft a notice gets something that reads like it was written by someone who learned Hindi from a textbook.

Three teams that decided not to wait

In 2019, a group of IIM-Ahmedabad graduates started building AI for Indian languages. In 2021, the Indian government launched a national translation mission. In 2023, Ola’s founder decided that India needed its own AI model — and announced it at a press conference before the product existed. By 2024, India had produced its first AI unicorn. By 2025, one of these three had raised more money than any Indian AI startup in history.

These are not the same story. They have different funders, different philosophies, and different definitions of what “building for India” actually means. One of them is genuinely impressive. One is more hype than product. One is technically the most important but almost nobody is talking about it.

The tool fight below will tell you which is which. But first — the number that puts all of this in context.
📊 The number

India has 528 million Hindi speakers (2011 Census), 97 million Bengali speakers, 81 million Telugu speakers, and 69 million Tamil speakers — among others. The combined population of India’s four largest language communities is larger than the entire population of the United States.

Yet 57% of Indian internet users prefer accessing the internet in Indian languages — and find it almost entirely in English. The IAMAI Internet in India Report 2024 puts active internet users at 886 million. More than 500 million of them would rather not be reading this in English. The AI tools built for this population? Fewer than a dozen serious products. The AI tools built for 340 million Americans? Thousands.

Why Sarvam matters more than the headlines suggest

Sarvam AI is not the most famous name in Indian AI. That title belongs to Krutrim, which Bhavish Aggarwal announced to great fanfare and which became India’s first AI unicorn in record time. But fame and quality are not the same thing, especially in AI.

Sarvam was built by researchers who left global AI labs to solve a specific problem: Indian languages are not just different vocabularies. They are different grammatical structures, different scripts, different phonologies. Hindi uses the Devanagari script. Tamil uses a script with 247 characters. Telugu has 56. An AI model that truly understands these languages cannot be built by fine-tuning a model that learned from Wikipedia in English.

Sarvam trained on Indian language data from the ground up. Their text-to-speech in Hindi sounds like someone who grew up speaking Hindi — not someone who learned it from a language app. Their last confirmed round was a $41 million Series A in December 2023. As of April 2026, they are in advanced talks to raise $300–350 million at a $1.5 billion valuation — with Bessemer Venture Partners, Nvidia and Amazon reportedly participating. The round has not closed. But the direction of travel is clear. And unlike Krutrim, their product exists, works, and is available through an API that Indian developers can use today.

What Bhashini is quietly becoming

Bhashini is the government’s answer. Launched by the Ministry of Electronics and Information Technology, it is a national language technology mission with one goal: make AI accessible in all 22 scheduled Indian languages. It is free. It is available through an open API. And because it is government-backed, it has something no private startup can match: institutional data access.

Government forms, court records, parliamentary debates, educational materials, health advisories — all of this exists in Indian languages, all of it locked in formats that private companies cannot easily access. Bhashini has access to most of it. This gives their translation models a depth of vocabulary and context that private competitors struggle to match in specialized domains.

The limitation is equally institutional: government products move slowly, get updated infrequently, and have user experiences that were designed by committee. Bhashini is powerful infrastructure that nobody has built a compelling product on top of yet. That gap is an opportunity for whoever reads this and moves first.

— The Tool Fight
Sarvam vs Krutrim vs Bhashini
One winner per category. No “it depends.” Tested for real Indian professional use cases.
Category Sarvam AI * Krutrim Bhashini
Hindi voice quality
✔ Natural, unaccented
★ Winner
Decent but slightly robotic on longer sentences
Functional, not natural
Infrastructure only
Regional language depth
Tamil, Telugu, Kannada
✔ All three, high quality
★ Winner
Hindi-first, others limited
22 languages including rare ones
★ Breadth winner
Translation accuracy
Legal & professional text
Good on general content, strong on professional
Inconsistent on technical terms
✔ Best for official/legal language
★ Winner
API access & developer experience
✔ Clean API, good docs, pay-per-use
★ Winner
API exists, docs are thin
Free API, complex integration
Cost for Indian SMEs
Pay-per-call, reasonable for scale
Consumer product, limited B2B pricing
✔ Free
★ Winner
Product maturity
Ships vs promises
✔ Shipped, production-ready, updated regularly
★ Winner
In our assessment: more announcements than shipped product so far
Editorial view
Stable but slow to update
* Sarvam voice = Bulbul (TTS model). Sarvam ASR = Saaras (STT model). Sarvam reasoning = Sarvam 30B/105B.  |  Based on testing and public benchmarks as of May 2026.
🎯 The verdict

For Indian developers and businesses: Start with Sarvam for voice and text. Use Bhashini’s translation API for legal or government-domain content (it’s free). Ignore Krutrim until the product catches up to the press releases.

For regional language content creators: Sarvam’s text-to-speech is the only tool currently producing Hindi, Tamil, and Telugu voice output that sounds like a native speaker. ElevenLabs is better for English and Hinglish. Neither is perfect for Kannada or Bengali yet — Bhashini is your fallback.

— One Thing
This week in India AI
Sarvam open-sourced two reasoning models trained entirely in India — and most people missed it
In February 2026, while the world was debating GPT-5 and DeepSeek, Sarvam released Sarvam 30B and Sarvam 105B as open-source models on Hugging Face — Apache 2.0, free to use. Both trained from scratch in India on IndiaAI Mission compute. Both outperform models of equivalent size on Indian language benchmarks. The 105B model (named Indus) powers their consumer app. The 30B model runs their enterprise voice platform. Saaras — their speech-to-text model — separately beats GPT-4o and Gemini 3 Pro on Indian language transcription. Three real products, shipped, in production. Not a press release. Now you know where to find them: huggingface.co/sarvamai
Indian AI tools directory →
— The Ask
💬
Know someone building in India who should read this?
Forward it. One WhatsApp. One LinkedIn message. That’s how India AI Brief grows — not through ads, through people who found it useful enough to share. There’s no referral scheme. Just this single ask.
💬 Forward on WhatsApp →
India AI Brief · Free Weekly
Get the next issue in your inbox
Every Saturday. Real stacks, INR pricing, honest takes — written for Indian professionals, not Silicon Valley.
✓ You’re in. Issue #5 lands next Saturday.