indicNLP is a collection of common tools used in text based natural language processing for Indian Languages. Many Indian Languages are similar in nature with some differences. Most of them share common or similar solutions to NLP and IRE tasks. And hence, a single framework for that.
indicNLP, IRE, NLP, Indian Languages, Tokenizer, stopwords, POS tagger, Stemmer, NER, Document Classification, Categorization, Spelling Variation Identification, Writing Variation Identification, text processing.
Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sindhi, Tamil, Telugu, Tibetan.
Information Retrieval and Extraction Course, Major Project, IIIT-H.
Project report: http://nisargjhaveri.github.io/indicNLP/report.pdf
YouTube (Presentation and Demo): https://youtu.be/Pwh1NYAF5Gw
SlideShare (Presentation): http://www.slideshare.net/NisargJhaveri/indicnlp-a-text-processing-framework-for-indian-languages
DropBox: DropBox shared folder