Suchir Agarwal

I’m a master’s student at Stanford University studying Computer Science with a focus in artificial intelligence. I previously completed my undergraduate degree in computer science and pure mathematics from the University of California, Berkeley.

I’m part of the Stanford Vision and Learning Lab (SVL) under Jiajun Wu and Fei Fei Li. Previously at Berkeley, I was in Jennifer Listgarten’s ML for protein engineering lab.

research

preprint

GPIC: A Giant Permissive Image Corpus for Visual Generation

Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal , Michael Jang, Michael Poli, Juan Carlos Niebles, Justin Johnson, Jiajun Wu, and Li Fei-Fei

arXiv preprint, 2026

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K validation, and 1M test examples. Moreover, all GPIC images are permissively licensed for both research and commercial use. GPIC is safety-filtered, deduplicated, and centrally hosted on Hugging Face. We provide a benchmarking protocol for generative modeling on GPIC. Finally, we provide a reference baseline for pixel-space flow matching on GPIC.

arXiv Website Dataset Code