DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
📰 ArXiv cs.AI
DanQing is a large-scale Chinese vision-language pre-training dataset with 100 million high-quality image-text pairs
Action Steps
- Collect and preprocess large-scale image-text pairs
- Pre-train vision-language models using DanQing dataset
- Fine-tune pre-trained models on specific downstream tasks
- Evaluate and analyze the performance of fine-tuned models
Who Needs to Know This
Machine learning researchers and engineers working on vision-language models can benefit from DanQing, as it provides a large-scale dataset for pre-training and fine-tuning Chinese VLP models
Key Insight
💡 DanQing addresses the lack of high-quality, large-scale open-source data for Chinese vision-language pre-training
Share This
💡 DanQing: 100M+ high-quality Chinese image-text pairs for vision-language pre-training
DeepCamp AI