The Hindi Internet Problem — India AI Brief Issue #4

— The Story

The AI that doesn’t think in your language

Here is something that should bother you more than it does. When you type in Hindi to ChatGPT, it is not thinking in Hindi. It is translating your words into English, processing them, and translating back. The same way a schoolteacher used to mark you wrong for writing a Hindi essay with English sentence structure — the model sees your language through a foreign lens.

This is not a minor technical footnote. It means that every AI tool you are using right now — ChatGPT, Claude, Gemini — was trained on a version of the world where Indian languages barely exist. English accounts for roughly half of all content on the world’s most visited websites. All Indian languages combined represent under 1% of that pool. Hindi, Tamil, Telugu, Kannada, Bengali — all of them, together, less than one percent of what the models learned from. Yet 57% of Indian internet users prefer to access the internet in Indian languages. The gap between what AI learned and what India needs is not a minor inefficiency. It is a structural failure.

“We have the world’s second-largest internet population. We produce less than 4% of its content. AI learned from what exists — not what we need.”

The gap between what AI trained on and what India actually needs

The consequence is real. Indian professionals who prompt in Hindi get answers that are technically accurate but culturally off. The idioms are wrong. The references are American. The suggestions assume infrastructure that doesn’t exist in most of India. A lawyer in Lucknow asking an AI to draft a notice gets something that reads like it was written by someone who learned Hindi from a textbook.

Three teams that decided not to wait

In 2019, a group of IIM-Ahmedabad graduates started building AI for Indian languages. In 2021, the Indian government launched a national translation mission. In 2023, Ola’s founder decided that India needed its own AI model — and announced it at a press conference before the product existed. By 2024, India had produced its first AI unicorn. By 2025, one of these three had raised more money than any Indian AI startup in history.

These are not the same story. They have different funders, different philosophies, and different definitions of what “building for India” actually means. One of them is genuinely impressive. One is more hype than product. One is technically the most important but almost nobody is talking about it.

►

The tool fight below will tell you which is which. But first — the number that puts all of this in context.

📊 The number

India has 528 million Hindi speakers (2011 Census), 97 million Bengali speakers, 81 million Telugu speakers, and 69 million Tamil speakers — among others. The combined population of India’s four largest language communities is larger than the entire population of the United States.

Yet 57% of Indian internet users prefer accessing the internet in Indian languages — and find it almost entirely in English. The IAMAI Internet in India Report 2024 puts active internet users at 886 million. More than 500 million of them would rather not be reading this in English. The AI tools built for this population? Fewer than a dozen serious products. The AI tools built for 340 million Americans? Thousands.

Why Sarvam matters more than the headlines suggest

Sarvam AI is not the most famous name in Indian AI. That title belongs to Krutrim, which Bhavish Aggarwal announced to great fanfare and which became India’s first AI unicorn in record time. But fame and quality are not the same thing, especially in AI.

Sarvam was built by researchers who left global AI labs to solve a specific problem: Indian languages are not just different vocabularies. They are different grammatical structures, different scripts, different phonologies. Hindi uses the Devanagari script. Tamil uses a script with 247 characters. Telugu has 56. An AI model that truly understands these languages cannot be built by fine-tuning a model that learned from Wikipedia in English.

Sarvam trained on Indian language data from the ground up. Their text-to-speech in Hindi sounds like someone who grew up speaking Hindi — not someone who learned it from a language app. Their last confirmed round was a $41 million Series A in December 2023. As of April 2026, they are in advanced talks to raise $300–350 million at a $1.5 billion valuation — with Bessemer Venture Partners, Nvidia and Amazon reportedly participating. The round has not closed. But the direction of travel is clear. And unlike Krutrim, their product exists, works, and is available through an API that Indian developers can use today.

What Bhashini is quietly becoming

Bhashini is the government’s answer. Launched by the Ministry of Electronics and Information Technology, it is a national language technology mission with one goal: make AI accessible in all 22 scheduled Indian languages. It is free. It is available through an open API. And because it is government-backed, it has something no private startup can match: institutional data access.

Government forms, court records, parliamentary debates, educational materials, health advisories — all of this exists in Indian languages, all of it locked in formats that private companies cannot easily access. Bhashini has access to most of it. This gives their translation models a depth of vocabulary and context that private competitors struggle to match in specialized domains.

The limitation is equally institutional: government products move slowly, get updated infrequently, and have user experiences that were designed by committee. Bhashini is powerful infrastructure that nobody has built a compelling product on top of yet. That gap is an opportunity for whoever reads this and moves first.

Category	Sarvam AI *	Krutrim	Bhashini
Hindi voice quality	✔ Natural, unaccented ★ Winner	Decent but slightly robotic on longer sentences	Functional, not natural Infrastructure only
Regional language depth Tamil, Telugu, Kannada	✔ All three, high quality ★ Winner	Hindi-first, others limited	22 languages including rare ones ★ Breadth winner
Translation accuracy Legal & professional text	Good on general content, strong on professional	Inconsistent on technical terms	✔ Best for official/legal language ★ Winner
API access & developer experience	✔ Clean API, good docs, pay-per-use ★ Winner	API exists, docs are thin	Free API, complex integration
Cost for Indian SMEs	Pay-per-call, reasonable for scale	Consumer product, limited B2B pricing	✔ Free ★ Winner
Product maturity Ships vs promises	✔ Shipped, production-ready, updated regularly ★ Winner	In our assessment: more announcements than shipped product so far Editorial view	Stable but slow to update

528 million Indians speak Hindi.ChatGPT learned it as an afterthought.

The AI that doesn’t think in your language

Three teams that decided not to wait

Why Sarvam matters more than the headlines suggest

What Bhashini is quietly becoming

528 million Indians speak Hindi.
ChatGPT learned it as an afterthought.