AI4Bharat, Chennai

Creating parallel corpora for Indian languages (Nov 2021 - May 2022)

I created a bitext-mining pipeline for Samanantar 2.0, the largest ever collection of publicly available parallel corpora for Indian languages.