About Us | Anuvadika

About Us

Anuvadika is the CIIL translation tool that integrates transliteration and language models. You can translate text of any Indian language (including English) to any Indian language (including English). This web application also supports transliteration of both source and target text. For ease of access, we have also implemented language detection module to identify the language of the source text automatically. However, users can still choose the language they want to translate from.

Translation Engine

The LDC-IL Translation Tool is built on the latest Bhashini Models, using various kinds of parallel corpora and language corpora, including all the LDC-IL text corpora.

Specifically, this platform uses the IndicTrans2 (Gala et.al, 2023) from AI4Bharat, a Research Lab at IIT Madras, supported by TDIL, NLTM and LDC-IL.

The the trained models for ai4bharat are available at https://github.com/AI4Bharat/IndicTrans2. It is a transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages, including English and multiple scripts for a few Indian languages.

There are 3 models (en-indic, indic-indic, indic-en) that are 1.1B parameter models, of which we use the distilled 600M param model in their optimized CTranslate2 formats for faster inference.

Further, we internally use a unified API that routes inference to the appropriate model upon user request. While facilitating dynamic batching for higher throughput.

The following are the supported languages that can be used for n x n translations:

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • English
  • Gujarati
  • Hindi
  • Kannada
  • Kashmiri (Arabic)
  • Kashmiri (Devanagari)
  • Konkani
  • Maithili
  • Malayalam
  • Manipuri (Bengali)
  • Manipuri (Meetei-Mayek)
  • Marathi
  • Nepali
  • Odia
  • Punjabi
  • Sanskrit
  • Santali
  • Sindhi (Arabic)
  • Sindhi (Devanagari)
  • Tamil
  • Telugu
  • Urdu

If you use this tool for any research purpose, please refer to LDC-IL by citing this paper:
"Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1"

Language Detection Model

The translation tool also deploys a language detection engine to automatically detect the language of the given source text. For this purpose, we use a language detection engine from facebookAI named "lid218e" (Joulin, Armand et.al. 2016). It was released as part of the NLLB project and can detect 217 languages. It's built on fastText, a library for efficient learning of word representations and sentence classification. Languages detected besides these are also detected and classified as unknown language and their lang ISO code is shown.

The following are the list of languages supported that the model can detect:

  • Hindi
  • Urdu
  • Sindhi
  • Bengali
  • Punjabi
  • Gujarati
  • Telugu
  • Tamil
  • Nepali
  • Marathi
  • Kannada
  • Malayalam
  • Assamese
  • Odia
  • Maithili
  • Sindhi
  • Konkani
  • Magahi
  • Chhattisgarhi
  • Kashmiri (Devanagari)
  • Bhojpuri (Devanagari)
  • Manipuri (Bengali)
  • Santali
  • Awadhi
  • English