J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (1): 138-150.doi: 10.3969/j.issn.1000-5641.2025.01.011

• Computer Science • Previous Articles    

Purging diffusion models through CLIP based fine-tuning

Ping WU, Xin LIN*()   

  1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2024-01-11 Online:2025-01-25 Published:2025-01-20
  • Contact: Xin LIN E-mail:xlin@cs.ecnu.edu.cn

Abstract:

Diffusion models have revolutionized text-to-image synthesis, enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts. Unfortunately, due to the large and unfiltered training dataset, inappropriate content such as nudity and violence can be generated from them. To deploy such models at a higher level of safety, we propose a novel method, directional contrastive language-image pre-training (CLIP) loss-based fine-tuning, dubbed as CLIF. This method utilizes directional CLIP loss to suppress the model’s inappropriate generation ability. CLIF is lightweight and immune to circumvention. To demonstrate the effectiveness of CLIF, we proposed a benchmark called categorized toxic prompts (CTP) to evaluate the ability to generate inappropriate content for text-to-image diffusion models. As shown by our experiments on CTP and common objects in context (COCO) datasets, CLIF is capable of significantly suppressing inappropriate generation while preserving the model’s ability to produce general content.

Key words: text-to-image generative models, security, datasets, diffusion models

CLC Number: