J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (5): 183-190.doi: 10.3969/j.issn.1000-5641.2025.05.017

• Ethics, Laws, and Security in Open Source and AI • Previous Articles    

Brief discussion on fair use for distribution of open-source large model datasets

Yunhu ZHAO1(), Yuzhou YANG2, Lin QIN2   

  1. 1. Open-Source Innovation and Digital Governance Research Institute, Shanghai University of International Business and Economics, Shanghai 200120, China
    2. Department of Intellectual Property Rights, Beijing Dacheng Law Offices, LLP (Shanghai), Shanghai 200120, China
  • Received:2025-01-22 Accepted:2025-08-06 Online:2025-09-25 Published:2025-09-25

Abstract:

The openness of large models requires not only sharing conventional computer software elements such as model architectures and training codes but also disclosing model parameters and datasets. Applying the analytical frameworks of the “four-factor test” and “three-step test” while considering the transformative nature and purpose of dataset distribution under open licenses as well as the public interest in technological development and application, one may conclude that distributing datasets for open-source large models constitutes fair use, thus obviating the necessity for obtaining copyright licenses from upstream right holders. Such an approach satisfies governance requirements regarding artificial-intelligence transparency and actively contributes to promoting knowledge sharing.

Key words: artificial intelligence act, open-source large model, licensing, dataset, fair use

CLC Number: