About LDC-IL

Established in 2007, the Linguistic Data Consortium for Indian Languages (LDC-IL) is a scheme of the Department of Higher Education, Ministry of Human Resource and Development, Government of India implemented by and housed inside the Central Institute of Indian Languages, Mysore.

Currently fully funded by the Government of India, as the name suggests, the Consortium is expected to generate its own funds and become a self-sufficient Institution in itself by way of developing the resources and distributing them to the interested developers, researchers and organizations engaged in using such resources.

LDC-IL has started distributing linguistic resources for Artificial Intelligence (AI) and Natural Language Processing (NLP),mainly in Indian languages, since 4th April, 2019 through its Data Distribution Portal when the portal was launched by the Hon'ble Vice President, Shri Venkaiah Naidu.

Language data is the key ingredient in terms of research and development in the area of language technology. As the time goes by, an increasing number of researchers are seeing the potential benefits of the use of an electronic corpus as a source of empirical language data for their research. The issues surrounding collection, processing and annotation of the quantities of linguistic data make it necessary to involve a number of disciplines like linguistics, computer science, statistics, engineering etc. Corpus linguists, as we all know, often use computational methods when analyzing their data whereas the computational linguists are dependent on computer-readable linguistic data to use in their research and in building practical tools and programmes.