基于CLIP微调的扩散模型安全化

吴平; 林欣

doi:10.3969/j.issn.1000-5641.2025.01.011

华东师范大学学报（自然科学版） >

2025 , Vol. 2025 >Issue 1: 138 - 150

DOI: https://doi.org/10.3969/j.issn.1000-5641.2025.01.011

计算机科学

基于CLIP微调的扩散模型安全化

吴平 ,
林欣

展开

华东师范大学计算机科学与技术学院, 上海　200062

林　欣, 男, 研究员, 博士生导师, 研究方向为知识图谱、多模态数据分析、人机混合智能. E-mail: xlin@cs.ecnu.edu.cn

收稿日期: 2024-01-11

网络出版日期: 2025-01-20

基金资助

统计与数据科学前沿理论及应用教育部重点实验室开放项目; 上海市科委项目 (21511100101)

版权

收起

Purging diffusion models through CLIP based fine-tuning

Ping WU ,
Xin LIN

Expand

School of Computer Science and Technology, East China Normal University, Shanghai　200062, China

Received date: 2024-01-11

Online published: 2025-01-20

Copyright

Fold

摘要

扩散模型变革了文本–图像生成领域, 使终端用户可以基于简单的自然语言提示生成高质量、多样化的图像艺术作品. 然而, 由于训练数据集庞大且未经过滤, 文本–图像生成模型具有生成色情内容与暴力内容等不适当内容的能力. 为更加安全地部署此类模型, 提出了一种基于CLIP (contrastive language-image pre-training) 方向性损失的微调 (directional CLIP loss based fine-tuning, CLIF)算法, 使用方向性的CLIP损失来微调模型, 以抑制其生成不适当内容的能力. CLIF消耗的计算资源很少, 并且具有强制生效的特点. 为评估其抑制效果, 提出了CTP (categorized toxic prompts)用于评估文本–图像生成模型的不适当内容生成能力. 在CTP与COCO (common objects in context) 上的实验结果表明, CLIF能够在抑制文本–图像扩散模型生成不安全内容的同时不影响其一般性生成能力.

关键词： 文本–图像生成模型; 安全性; 数据集; 扩散模型

本文引用格式

吴平 , 林欣 . 基于CLIP微调的扩散模型安全化[J]. 华东师范大学学报（自然科学版）, 2025 , 2025(1) : 138 -150 . DOI: 10.3969/j.issn.1000-5641.2025.01.011

Abstract

Diffusion models have revolutionized text-to-image synthesis, enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts. Unfortunately, due to the large and unfiltered training dataset, inappropriate content such as nudity and violence can be generated from them. To deploy such models at a higher level of safety, we propose a novel method, directional contrastive language-image pre-training (CLIP) loss-based fine-tuning, dubbed as CLIF. This method utilizes directional CLIP loss to suppress the model’s inappropriate generation ability. CLIF is lightweight and immune to circumvention. To demonstrate the effectiveness of CLIF, we proposed a benchmark called categorized toxic prompts (CTP) to evaluate the ability to generate inappropriate content for text-to-image diffusion models. As shown by our experiments on CTP and common objects in context (COCO) datasets, CLIF is capable of significantly suppressing inappropriate generation while preserving the model’s ability to produce general content.

Key words： text-to-image generative models; security; datasets; diffusion models

参考文献

1	SONG Y, ERMON S. Generative modeling by estimating gradients of the data distribution [EB/OL]. (2020-10-10)[2024-01-01]. https://arxiv.org/pdf/1907.05600.
2	HO J, JAIN A, ABBEEL P.. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020, 33, 6840- 6851.
3	SOHL-DICKSTEIN J, WEISS E, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics [C]// International Conference on Machine Learning. 2015: 2256-2265.
4	RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents [EB/OL]. (2022-04-13)[2024-01-01]. https://arxiv.org/abs/2204.06125.
5	SAHARIA C, CHAN W, SAXENA S, et al.. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022, 35, 36479- 36494.
6	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10684-10695.
7	CRESWELL A, WHITE T, DUMOULIN V, et al.. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 2018, 35 (1): 53- 65.
8	ESSER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12873-12883.
9	KINGMA D P, WELLING M. Auto-encoding variational bayes [EB/OL]. (2022-12-10)[2024-01-01]. https://arxiv.org/abs/1312.6114.
10	VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6309-6318.
11	DHARIWAL P, NICHOL A.. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 2021, 34, 8780- 8794.
12	SCHUHMANN C, VENCU R, BEAUMONT R, et al. LAION-400M: Open dataset of CLIP-filtered 400 million image-text pairs [EB/OL]. (2021-11-03)[2024-01-01]. https://arxiv.org/abs/2111.02114.
13	SCHUHMANN C, BEAUMONT R, VENCU R, et al.. LAION-5B: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 2022, 35, 25278- 25294.
14	GEBRU T, MORGENSTERN J, VECCHIONE B, et al.. Datasheets for datasets. Communications of the ACM, 2021, 64 (12): 86- 92.
15	WANG Z J, MONTOYA E, MUNECHIKA D, et al. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models [C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023: 893-911.
16	SCHRAMOWSKI P, BRACK M, DEISEROTH B, et al. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 22522-22531.
17	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision [C]// International Conference on Machine Learning. 2021: 8748-8763.
18	NICHOL A, DHARIWAL P, RAMESH A, et al. Glide: Towards photorealistic image generation and editing with text-guided diffusion models [C]// International Conference on Machine Learning. 2022: 16784-16804.
19	RUIZ N, LI Y, JAMPANI V, et al. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 22500-22510.
20	KUMARI N, ZHANG B, ZHANG R, et al. Multi-concept customization of text-to-image diffusion [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1931-1941.
21	ZHANG E, WANG K, XU X, et al. Forget-me-not: Learning to forget in text-to-image diffusion models [EB/OL]. (2023-03-30)[2024-01-01]. https://arxiv.org/abs/2303.17591.
22	GANDIKOTA R, MATERZYNSKA J, FIOTTO-KAUFMAN J, et al. Erasing concepts from diffusion models [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 2426-2436.
23	RANDO J, PALEKA D, LINDNER D, et al. Red-teaming the stable diffusion safety filter [EB/OL]. (2022-11-10)[2024-01-01]. https://arxiv.org/abs/2210.04610.
24	SCHRAMOWSKI P, TAUCHMANN C, KERSTING K. Can machines help us answering question 16 in datasheets, and in turn reflecting on inappropriate content? [C]// 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022: 1350-1361.
25	LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 3045-3059.
26	EICHENBERG C, BLACK S, WEINBACH S, et al. MAGMA–multimodal augmentation of generative models through adapter-based finetuning [C]// Findings of the Association for Computational Linguistics: EMNLP 2022. 2022: 2416-2428.
27	GAL R, PATASHNIK O, MARON H, et al.. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics, 2022, 41 (4): 141.
28	KIM G, KWON T, YE J C. DiffusionCLIP: Text-guided diffusion models for robust image manipulation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2426-2435.
29	GEHMAN S, GURURANGAN S, SAP M, et al. RealToxicityPrompts: Evaluating neural toxic degeneration in language models [C]// Findings of the Association for Computational Linguistics: EMNLP 2020. 2020: 3356-3369.
30	CHANG M, DRUGA S, FIANNACA A J, et al. The prompt artists [C]// Proceedings of the 15th Conference on Creativity and Cognition. 2023: 75-87.
31	WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 1112-1122.
32	RAFFEL C, SHAZEER N, ROBERTS A, et al.. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020, 21 (1): 5485- 5551.
33	FAWCETT T.. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27 (8): 861- 874.
34	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context [C]// Computer Vision–ECCV 2014. 2014: 740-755.
35	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6629-6640.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献