News Brief
Swarajya Staff
Aug 06, 2025, 02:44 PM | Updated 02:44 PM IST
Save & read from anywhere!
Bookmark stories for easy access on any device or the Swarajya app.
The Centre has announced that BharatGen, the country’s first government-backed initiative to develop foundational AI models tailored to Indian languages and societal contexts, is on track to support all 22 scheduled Indian languages by June 2026.
Currently, BharatGen models support nine languages—Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada.
In a written reply to a question in the Lok Sabha on Wednesday (6 August), Union Minister of Science and Technology Jitendra Singh stated that BharatGen encompasses AI technologies across multiple modalities, including large language models for text, speech-to-text and text-to-speech systems, and vision-language models.
The next milestone is set for December 2025, by when the models will include a total of 15 languages, with the remaining scheduled languages being added by mid-2026.
"Currently, BharatGen models cover 9 Indian languages which include Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada," the minister said.
"By December 2025, a total of 15 Indian Languages (including Assamese, Bengali, Gujarati, Hindi, Kannada, Maithili, Malayalam, Marathi, Nepali, Odia, Punjabi, Sanskrit, Sindhi, Tamil and Telugu) will be covered," he said.
"By June 2026, all 22 scheduled Indian languages will be covered," Singh added.
The minister said that BharatGen has developed sector-specific applications in agriculture, governance, and defence. These have been piloted in selected regions and are intended to be rolled out across all states and districts once the deployment phase is complete.
The initiative operates under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS) of the Department of Science and Technology.
The implementation is led by two Technology Innovation Hubs (TIHs): the TIH Foundation for IoT and IoE at IIT Bombay, which coordinates the national program and oversees model development, and the IITM Pravartak Technologies Foundation at IIT Madras, which focuses on real-world deployment in areas like governance, media, and security.
The BharatGen consortium comprises several top-tier academic and research institutions.
IIT Bombay acts as the lead institution, managing research integration across partners. IIIT Hyderabad leads on vision-language modelling, IIT Madras is responsible for speech model development, IIT Kanpur focuses on legal AI and multilingual tokenization strategies, IIT Hyderabad works on vocabulary optimization for large language models, IIT Mandi is tasked with inclusive model development and efficient training methods, and IIM Indore handles model evaluation and multilingual data collection.