State-of-the-Art Language Modeling and Text Classification in Hindi Language
This project is maintained by NirantK
State-of-the-Art Language Modeling and Text Classification in Hindi Language —
We achieved State of the Art Perplexity = 46.81 for Hindi compared to 40.68 for English (lower is better)
Update: nlp-for-hindi uses sentencepiece instead of the word based spacCy tokenizer which I use. On those tokens, the measured perplexity for that LM is ~35. I encourage you to check that work out as well.
This version of the notebook uses fastai lib’s v0.7, used in their Part 2 v2 course in Summer 2018. The best way to install it via conda as mentioned here
Special thanks to Jeremy, Rachel and other contributors to fastai. This work is a reproduction of their work in English to Hindi. Thanks to @cstorm125 for thai2vec which inspired this work.