CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
📰 ArXiv cs.AI
arXiv:2605.12153v1 Announce Type: cross Abstract: We present Curated Industrial Developer Repository (CIDR), a large-scale dataset of real-world software repositories collected through direct collaboration with 12 industrial partner organizations. The dataset comprises 2,440 repositories spanning 138 programming languages and totalling 373 million lines of code, accompanied by structured per-repository metadata. Unlike existing code corpora derived from public open-source platforms, CIDR consist
DeepCamp AI