略论开源大模型数据集分发的合理使用

doi:10.3969/j.issn.1000-5641.2025.05.017

华东师范大学学报（自然科学版） ›› 2025, Vol. 2025 ›› Issue (5): 183-190.doi: 10.3969/j.issn.1000-5641.2025.05.017

• 开源与AI的伦理、法律及安全 • 上一篇

略论开源大模型数据集分发的合理使用

赵云虎¹(), 杨宇宙², 秦琳²

1. 上海对外经贸大学开源创新与数字治理研究院, 上海　200120
2. 北京大成(上海)律师事务所知识产权部, 上海　200120

收稿日期:2025-01-22 接受日期:2025-08-06 出版日期:2025-09-25 发布日期:2025-09-25
作者简介:赵云虎, 男, 特聘教授, 研究方向为知识产权. E-mail: yunhu.zhao@dentons.cn

Brief discussion on fair use for distribution of open-source large model datasets

Yunhu ZHAO¹(), Yuzhou YANG², Lin QIN²

1. Open-Source Innovation and Digital Governance Research Institute, Shanghai University of International Business and Economics, Shanghai　200120, China
2. Department of Intellectual Property Rights, Beijing Dacheng Law Offices, LLP (Shanghai), Shanghai　200120, China

Received:2025-01-22 Accepted:2025-08-06 Online:2025-09-25 Published:2025-09-25

摘要/Abstract

摘要：

大模型的开源不仅需要开放传统的计算机软件形式的模型架构、训练代码等, 也需要开放模型的参数和数据集. 根据“四要素分析法”和“三步检验法”的分析框架, 尤其是考虑到以开放许可证分发的数据集具有转换性使用的性质和目的, 以及对于科技发展和应用的公共利益, 可以认定开源大模型数据集的分发属于合理使用, 不需要上游权利人的著作权许可. 这样, 既满足了对于人工智能透明度的治理要求, 也具有促进知识共享的积极作用.

关键词: 人工智能法, 开源大模型, 许可证, 数据集, 合理使用

Abstract:

The openness of large models requires not only sharing conventional computer software elements such as model architectures and training codes but also disclosing model parameters and datasets. Applying the analytical frameworks of the “four-factor test” and “three-step test” while considering the transformative nature and purpose of dataset distribution under open licenses as well as the public interest in technological development and application, one may conclude that distributing datasets for open-source large models constitutes fair use, thus obviating the necessity for obtaining copyright licenses from upstream right holders. Such an approach satisfies governance requirements regarding artificial-intelligence transparency and actively contributes to promoting knowledge sharing.

Key words: artificial intelligence act, open-source large model, licensing, dataset, fair use

中图分类号:

D923
TP311.5

赵云虎, 杨宇宙, 秦琳. 略论开源大模型数据集分发的合理使用[J]. 华东师范大学学报（自然科学版）, 2025, 2025(5): 183-190.

Yunhu ZHAO, Yuzhou YANG, Lin QIN. Brief discussion on fair use for distribution of open-source large model datasets[J]. J* E* C* N* U* N* S*, 2025, 2025(5): 183-190.

参考文献 28

1	徐小奔.. 技术中立视角下人工智能模型训练的著作权合理使用. 法学评论, 2024, 42 (4): 86- 99.
2	熊琦.. “用户创造内容”与作品转换性使用认定. 法学评论, 2017, 35 (3): 64- 74.
3	张吉豫, 汪赛飞.. 大模型数据训练中的著作权合理使用研究. 华东政法大学学报, 2024, 27 (4): 20- 33.
4	崔国斌.. 大数据有限排他权的基础理论. 法学研究, 2019, 41 (5): 3- 24.
5	吴汉东.. 美国著作权法中合理使用的“合理性”判断标准. 外国法译评, 1997, 19 (3): 45- 58.
6	European Commission. Proposal for a regulation of the European parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts [EB/OL]. (2021-04-21)[2023-11-28]. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206.
7	The European Parliament and the Council of the European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 [EB/OL]. (2024-07-12)[2024-10-07]. https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=OJ:L_202401689&qid=1728270809296#d1e1907-1-1.
8	WHITE M, HADDAD I, OSBORNE C, et al. The model openness framework: Promoting completeness and openness for reproducibility, transparency, and usability in artificial intelligence [EB/OL]. (2024-03-20)[2024-05-31]. https://arxiv.org/pdf/2403.13784.
9	赵云虎. 自由和开源软件法律报告(中国) [EB/OL]. (2022-07-11)[2023-11-15]. https://law.wkinfo.com.cn/commentary/detail/NDg3?aid=MzAwMDM2Mw%3D%3D&bid=QjAwMDAyMTQ4MjA%3D&lang=cn&module=&rdt=1753688250760&searchId=850aee16f12f41d4bc5bfb7dd35ad09f.
10	BIGSCIENCE. BigScience RAIL License v1.0 [EB/OL]. (2022-05-19)[2023-11-29]. https://huggingface.co/spaces/bigscience/license.
11	赵云虎, 杨宇宙. 深入大模型的版权归属问题 [EB/OL]. (2024-01-18)[2024-01-20]. https://mp.weixin.qq.com/s/PczAdo_GbayDb6LY9vl9Ag.
12	赵旭.. 生成式人工智能在机器学习中的合理使用问题. 暨南学报(哲学社会科学版), 2024, 46 (3): 79- 95.
13	LIU J.. An empirical study of transformative use in copyright law. Stanford Technology Law Review, 2019, 22 (1): 174- 180.
14	Authors Guild v. Google, Inc.804 F. 3d 202 (2d Cir. 2015) [EB/OL]. (2016-05-31)[2023-10-25]. https://www.law.berkeley.edu/wp-content/uploads/2016/05/Authors-Guild-v-Google-804_F.3d_202.pdf.
15	周文康, 费艳颖.. 生成式人工智能创作使用作品的合理使用调适. 科技与法律(中英文), 2024, 21 (3): 77- 87.
16	司晓, 曹建峰.. 欧盟版权法改革中的大数据与人工智能问题研究. 西北工业大学学报(社会科学版), 2019, 39 (3): 95- 102.
17	上海知识产权法院. 上海玄霆娱乐信息科技有限公司、北京乐触无限软件技术有限公司等与无锡天下九九文化发展有限公司、张牧野著作权权属、侵权纠纷上诉案, (2017)沪73民终324号 [EB/OL]. (2019-12-10) [2024-12-20]. https://wenshu.court.gov.cn/website/wenshu/181107ANFZ0BXSK4/index.html?docId=8yAVlAKarxbmg2rNTPRjvp6LMCudCC+vADkekN66PDSBuL9OwBmQFJ/dgBYosE2glmYasPw4/m3cQw4IrXel/FKrIKVRP25b2P/xpokqcW/Mobb64k/sKhCmSnixX59U.
18	广东省深圳市中级人民法院. 深圳市腾讯计算机系统有限公司与北京微播视界科技有限公司侵害著作权及不正当竞争纠纷一审民事判决书, (2019) 粤03民初2836号 [EB/OL]. [2024-12-20]. https://law.wkinfo.com.cn/judgment-documents/detail/MjAzNDY5ODc4Nzc%3D?searchId=0400d0a6e443490ebaf7762c6eac596d&index=1&q=2019%E7%B2%A403%E6%B0%91%E5%88%9D2836%E5%8F%B7&module=&childModule=all&summary=%E5%B9%BF%E4%B8%9C%E7%9C%81%E6%B7%B1%E5%9C%B3%E5%B8%82%E4%B8%AD%E7%BA%A7%E4%BA%BA%E6%B0%91%E6%B3%95%E9%99%A2%0D%E6%B0%91%E4%BA%8B%E5%88%A4%E5%86%B3%E4%B9%A6%0D%EF%BC%882019%EF%BC%89%E7%B2%A403%E6%B0%91%E5%88%9D.
19	北京市第一中级人民法院. 王莘与北京谷翔信息技术有限公司、谷歌公司侵犯著作权纠纷案, (2011)一中民初字第1321号 [EB/OL]. (2014-03-31) [2024-12-20]. https://wenshu.court.gov.cn/website/wenshu/181107ANFZ0BXSK4/index.html?docId=i37nDuF0QAJbGiizEGeIlhNFirdGPGRHfXzJFclnDYPaJ1gYSuGBjp/dgBYosE2glmYasPw4/m3cQw4IrXel/FKrIKVRP25b2P/xpokqcW+4KjMvwjEYQk3lU0CL1qH1.
20	北京市高级人民法院. 北京谷翔信息技术有限公司、谷歌公司与王莘侵犯著作权纠纷上诉案, (2013)高民终字第1221号 [EB/OL]. [2024-12-20]. https://law.wkinfo.com.cn/case-analysis/detail/MkExMDAwMDI3NTM%3D?searchId=f6670bb02a6b441a9cd7cd34bdb18f88&index=1&q=2013%20%E9%AB%98%20%E6%B0%91&module=&childModule=all&summary=%0D%09%E3%80%90%E7%9B%B8%E5%85%B3%E6%B3%95%E6%9D%A1%E3%80%91%0D%09%E3%80%8A%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD%E8%91%97%E4%BD%9C%E6%9D%83%E6%B3%95%E3%80%8B%E7%AC%AC%E4%BA%8C%E5%8D%81%E4%BA%8C%E6%9D%A1%0D%09.
21	Hamburg Regional Court. Germany [2024]: Robert Kneschke v. LAION e.V., Case No. 310 O 227/23 [EB/OL]. (2024-09-27) [2024-12-20]. https://www.landesrecht-hamburg.de/bsha/document/NJRE001588058.
22	United States District Court. Case No. 23-cv-00201-WHO [EB/OL]. (2023-10-30) [2024-12-20]. https://law.justia.com/cases/federal/district-courts/california/candce/3:2023cv00201/407208/117.
23	杭州互联网法院. 上海某文化发展有限公司与杭州某智能科技有限公司著作权侵权及不正当竞争纠纷案, (2024)浙0192民初1587号 [EB/OL]. (2025-08-15) [2025-08-17]. https://wenshu.court.gov.cn/website/wenshu/181107ANFZ0BXSK4/index.html?docId=RMsHL+R1VwvFwcziJsMn2OXdxFcPnyVWy3L2sX+e5YSsvh2n7xV9yZ/dgBYosE2glmYasPw4/m3cQw4IrXel/FKrIKVRP25b2P/xpokqcW8c+fJ0igpNowfZWm2aiRb5.
24	王迁, 褚楚.. 人工智能与著作权边界初探: 技术进步下的法律挑战与思考. 中国编辑, 2024, 176 (8): 56- 62.
25	GINSBURG J. 美国法上的合理使用再探: 改革还是仍然扭曲? [EB/OL]. (2024-10-06)[2024-11-30]. https://mp.weixin.qq.com/s/GL9SdhAGM6BTVtMKkJj-WA.
26	GRYNBAUM M, MAC R. The times sues OpenAI and microsoft over A. I. use of copyrighted work. [EB/OL]. (2023-12-07)[2023-12-28]. https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.
27	HOFFMANN M, NAGLE F, ZHOU Y. The value of open source software [EB/OL]. (2024-01-01)[2024-01-30]. https://www.hbs.edu/ris/Publication%20Files/24-038_51f8444f-502c-4139-8bf2-56eb4b65c58a.pdf.
28	张平.. 透明度原则在人工智能治理中的适用. 数字法治, 2025, 13 (1): 20- 27.

[1]	何德鑫, 韩凡宇, 王伟. 大语言模型在开源项目主题标注中的应用与评估研究[J]. 华东师范大学学报（自然科学版）, 2025, 2025(5): 14-24.
[2]	徐星星, 黄昶. 基于高完备性自建数据集的集装箱锁销识别[J]. 华东师范大学学报（自然科学版）, 2025, 2025(4): 28-37.
[3]	吴平, 林欣. 基于CLIP微调的扩散模型安全化[J]. 华东师范大学学报（自然科学版）, 2025, 2025(1): 138-150.
[4]	赵大鹏,梁磊,田秀霞,王晓玲. LBS的隐私保护：模型与进展[J]. 华东师范大学学报(自然科学版), 2015, 2015(5): 28-45.

略论开源大模型数据集分发的合理使用

Brief discussion on fair use for distribution of open-source large model datasets

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献 28

相关文章 4

编辑推荐

Metrics

本文评价