A collection of basic text processing modules for Indian languages

View the Project on GitHub nisargjhaveri/indicNLP

indicNLP is a collection of common tools used in text based natural language processing for Indian Languages. Many Indian Languages are similar in nature with some differences. Most of them share common or similar solutions to NLP and IRE tasks. And hence, a single framework for that.

It includes

Code quality and Quality assurance

Build Status Coverage Status



indicNLP, IRE, NLP, Indian Languages, Tokenizer, stopwords, POS tagger, Stemmer, NER, Document Classification, Categorization, Spelling Variation Identification, Writing Variation Identification, text processing.

Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sindhi, Tamil, Telugu, Tibetan.

Information Retrieval and Extraction Course, Major Project, IIIT-H.


GitHub repository:
Project homepage:

Project report:
YouTube (Presentation and Demo):
SlideShare (Presentation):
DropBox: DropBox shared folder