Journal of East China Normal University(Natural Science) >
Purging diffusion models through CLIP based fine-tuning
Received date: 2024-01-11
Online published: 2025-01-20
Copyright
Diffusion models have revolutionized text-to-image synthesis, enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts. Unfortunately, due to the large and unfiltered training dataset, inappropriate content such as nudity and violence can be generated from them. To deploy such models at a higher level of safety, we propose a novel method, directional contrastive language-image pre-training (CLIP) loss-based fine-tuning, dubbed as CLIF. This method utilizes directional CLIP loss to suppress the model’s inappropriate generation ability. CLIF is lightweight and immune to circumvention. To demonstrate the effectiveness of CLIF, we proposed a benchmark called categorized toxic prompts (CTP) to evaluate the ability to generate inappropriate content for text-to-image diffusion models. As shown by our experiments on CTP and common objects in context (COCO) datasets, CLIF is capable of significantly suppressing inappropriate generation while preserving the model’s ability to produce general content.
Key words: text-to-image generative models; security; datasets; diffusion models
Ping WU , Xin LIN . Purging diffusion models through CLIP based fine-tuning[J]. Journal of East China Normal University(Natural Science), 2025 , 2025(1) : 138 -150 . DOI: 10.3969/j.issn.1000-5641.2025.01.011
1 | SONG Y, ERMON S. Generative modeling by estimating gradients of the data distribution [EB/OL]. (2020-10-10)[2024-01-01]. https://arxiv.org/pdf/1907.05600. |
2 | HO J, JAIN A, ABBEEL P.. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020, 33, 6840- 6851. |
3 | SOHL-DICKSTEIN J, WEISS E, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics [C]// International Conference on Machine Learning. 2015: 2256-2265. |
4 | RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents [EB/OL]. (2022-04-13)[2024-01-01]. https://arxiv.org/abs/2204.06125. |
5 | SAHARIA C, CHAN W, SAXENA S, et al.. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022, 35, 36479- 36494. |
6 | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10684-10695. |
7 | CRESWELL A, WHITE T, DUMOULIN V, et al.. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 2018, 35 (1): 53- 65. |
8 | ESSER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12873-12883. |
9 | KINGMA D P, WELLING M. Auto-encoding variational bayes [EB/OL]. (2022-12-10)[2024-01-01]. https://arxiv.org/abs/1312.6114. |
10 | VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6309-6318. |
11 | DHARIWAL P, NICHOL A.. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 2021, 34, 8780- 8794. |
12 | SCHUHMANN C, VENCU R, BEAUMONT R, et al. LAION-400M: Open dataset of CLIP-filtered 400 million image-text pairs [EB/OL]. (2021-11-03)[2024-01-01]. https://arxiv.org/abs/2111.02114. |
13 | SCHUHMANN C, BEAUMONT R, VENCU R, et al.. LAION-5B: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 2022, 35, 25278- 25294. |
14 | GEBRU T, MORGENSTERN J, VECCHIONE B, et al.. Datasheets for datasets. Communications of the ACM, 2021, 64 (12): 86- 92. |
15 | WANG Z J, MONTOYA E, MUNECHIKA D, et al. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models [C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023: 893-911. |
16 | SCHRAMOWSKI P, BRACK M, DEISEROTH B, et al. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 22522-22531. |
17 | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision [C]// International Conference on Machine Learning. 2021: 8748-8763. |
18 | NICHOL A, DHARIWAL P, RAMESH A, et al. Glide: Towards photorealistic image generation and editing with text-guided diffusion models [C]// International Conference on Machine Learning. 2022: 16784-16804. |
19 | RUIZ N, LI Y, JAMPANI V, et al. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 22500-22510. |
20 | KUMARI N, ZHANG B, ZHANG R, et al. Multi-concept customization of text-to-image diffusion [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1931-1941. |
21 | ZHANG E, WANG K, XU X, et al. Forget-me-not: Learning to forget in text-to-image diffusion models [EB/OL]. (2023-03-30)[2024-01-01]. https://arxiv.org/abs/2303.17591. |
22 | GANDIKOTA R, MATERZYNSKA J, FIOTTO-KAUFMAN J, et al. Erasing concepts from diffusion models [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 2426-2436. |
23 | RANDO J, PALEKA D, LINDNER D, et al. Red-teaming the stable diffusion safety filter [EB/OL]. (2022-11-10)[2024-01-01]. https://arxiv.org/abs/2210.04610. |
24 | SCHRAMOWSKI P, TAUCHMANN C, KERSTING K. Can machines help us answering question 16 in datasheets, and in turn reflecting on inappropriate content? [C]// 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022: 1350-1361. |
25 | LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 3045-3059. |
26 | EICHENBERG C, BLACK S, WEINBACH S, et al. MAGMA–multimodal augmentation of generative models through adapter-based finetuning [C]// Findings of the Association for Computational Linguistics: EMNLP 2022. 2022: 2416-2428. |
27 | GAL R, PATASHNIK O, MARON H, et al.. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics, 2022, 41 (4): 141. |
28 | KIM G, KWON T, YE J C. DiffusionCLIP: Text-guided diffusion models for robust image manipulation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2426-2435. |
29 | GEHMAN S, GURURANGAN S, SAP M, et al. RealToxicityPrompts: Evaluating neural toxic degeneration in language models [C]// Findings of the Association for Computational Linguistics: EMNLP 2020. 2020: 3356-3369. |
30 | CHANG M, DRUGA S, FIANNACA A J, et al. The prompt artists [C]// Proceedings of the 15th Conference on Creativity and Cognition. 2023: 75-87. |
31 | WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 1112-1122. |
32 | RAFFEL C, SHAZEER N, ROBERTS A, et al.. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020, 21 (1): 5485- 5551. |
33 | FAWCETT T.. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27 (8): 861- 874. |
34 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context [C]// Computer Vision–ECCV 2014. 2014: 740-755. |
35 | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6629-6640. |
/
〈 |
|
〉 |