Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

📰 ArXiv cs.AI

Chitrakshara is a large multilingual multimodal dataset for Indian languages to improve Vision-Language Models (VLMs)

advanced Published 26 Mar 2026

Action Steps

Collect and preprocess the Chitrakshara dataset
Use the dataset for large-scale pretraining of VLMs
Fine-tune the pretrained models on specific downstream tasks
Evaluate the performance of the models on Indian language tasks

Who Needs to Know This

AI engineers and researchers working on multimodal models, particularly those focused on Indian languages, can benefit from this dataset to enhance their models' performance and representation of diverse languages

Key Insight

💡 The Chitrakshara dataset can help improve the representation of Indian languages in Vision-Language Models (VLMs)