Partial data cleaning
How does data cleaning affect a generative model’s ability to depict a concept? To what degree can partial cleaning guarantee that a model cannot depict CSAM?
Guaranteeing complete CSAM removal is difficult due to imperfect detection technology, scale of data, and moderator wellness concerns. Determining whether imperfect removal can prevent text-to-image models from learning to generate CSAM is an important open question.
Learn more
Existing Work
LLM and diffusion studies show that filtering training data can minimize harmful capabilities, and research in text-to-image generation shows a critical number of training samples are required for concept composition.
Limitations
AIG-CSAM generation requires strong safety guarantees. While existing work is a starting point, the problem would benefit from formal guarantees on a model’s ability to generate harmful content. Because it is illegal for researchers to generate CSAM, it is unclear how to exhaustively test models’ capabilities.