华东师范大学学报(自然科学版) ›› 2025, Vol. 2025 ›› Issue (5): 183-190.doi: 10.3969/j.issn.1000-5641.2025.05.017

• 开源与AI的伦理、法律及安全 • 上一篇    

略论开源大模型数据集分发的合理使用

赵云虎1(), 杨宇宙2, 秦琳2   

  1. 1. 上海对外经贸大学 开源创新与数字治理研究院, 上海 200120
    2. 北京大成(上海)律师事务所 知识产权部, 上海 200120
  • 收稿日期:2025-01-22 接受日期:2025-08-06 出版日期:2025-09-25 发布日期:2025-09-25
  • 作者简介:赵云虎, 男, 特聘教授, 研究方向为知识产权. E-mail: yunhu.zhao@dentons.cn

Brief discussion on fair use for distribution of open-source large model datasets

Yunhu ZHAO1(), Yuzhou YANG2, Lin QIN2   

  1. 1. Open-Source Innovation and Digital Governance Research Institute, Shanghai University of International Business and Economics, Shanghai 200120, China
    2. Department of Intellectual Property Rights, Beijing Dacheng Law Offices, LLP (Shanghai), Shanghai 200120, China
  • Received:2025-01-22 Accepted:2025-08-06 Online:2025-09-25 Published:2025-09-25

摘要:

大模型的开源不仅需要开放传统的计算机软件形式的模型架构、训练代码等, 也需要开放模型的参数和数据集. 根据“四要素分析法”和“三步检验法”的分析框架, 尤其是考虑到以开放许可证分发的数据集具有转换性使用的性质和目的, 以及对于科技发展和应用的公共利益, 可以认定开源大模型数据集的分发属于合理使用, 不需要上游权利人的著作权许可. 这样, 既满足了对于人工智能透明度的治理要求, 也具有促进知识共享的积极作用.

关键词: 人工智能法, 开源大模型, 许可证, 数据集, 合理使用

Abstract:

The openness of large models requires not only sharing conventional computer software elements such as model architectures and training codes but also disclosing model parameters and datasets. Applying the analytical frameworks of the “four-factor test” and “three-step test” while considering the transformative nature and purpose of dataset distribution under open licenses as well as the public interest in technological development and application, one may conclude that distributing datasets for open-source large models constitutes fair use, thus obviating the necessity for obtaining copyright licenses from upstream right holders. Such an approach satisfies governance requirements regarding artificial-intelligence transparency and actively contributes to promoting knowledge sharing.

Key words: artificial intelligence act, open-source large model, licensing, dataset, fair use

中图分类号: