DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

📰 ArXiv cs.AI

DanQing is a large-scale Chinese vision-language pre-training dataset with 100 million high-quality image-text pairs

advanced Published 26 Mar 2026
Action Steps
  1. Collect and preprocess large-scale image-text pairs
  2. Pre-train vision-language models using DanQing dataset
  3. Fine-tune pre-trained models on specific downstream tasks
  4. Evaluate and analyze the performance of fine-tuned models
Who Needs to Know This

Machine learning researchers and engineers working on vision-language models can benefit from DanQing, as it provides a large-scale dataset for pre-training and fine-tuning Chinese VLP models

Key Insight

💡 DanQing addresses the lack of high-quality, large-scale open-source data for Chinese vision-language pre-training

Share This
💡 DanQing: 100M+ high-quality Chinese image-text pairs for vision-language pre-training
Read full paper → ← Back to News