Reproducibility is the New Copyleft: Defining AGI-oriented Reproducible Builds
可重现性是新的Copyleft:定义面向AGI的可重现构建
Masayuki Hatta
AI总结 本文提出面向通用人工智能(AGI)的可重现构建作为Copyleft的功能等价物,通过定义七项要求来确保模型从声明输入到输出的比特精确可重现性,并论证协议而非平台是更优的治理框架。
详情
- Comments
- Accepted at AGI-26. To appear in the proceedings (Springer LNCS)
Copyleft,如GNU通用公共许可证中所实施的,是一种利用版权保证用户自由的法律技巧,通过将源代码的可用性与每次分发行为绑定。其规范力量依赖于一个隐含的技术前提:源代码和目标代码之间存在定义明确、可人工审计且可重现的关系。大型语言模型以及未来的通用人工智能(AGI)系统系统地违反了这一前提。重建模型所需的工件——代码、数据、权重、超参数、工具链和硬件配置——各自受到独立的法律、技术和经济约束,当前没有任何开源框架能完全解决这些问题。足够强大的AI系统还可以将许可下的源代码重写为功能等效的衍生作品,从而剥离原始义务,这是一种Copyleft无法有效防御的洗白形式。本文认为,对于AGI,Copyleft的功能等价物必须基于可重现构建,而非代码的共享相同条款:可重现构建是一种保证从声明输入到输出比特精确可重构性的实践。我们回顾了Copyleft的逻辑,批判性地审视了Maffulli的“第二次解放”论点(即AI实现了Stallman的梦想),并表明除非AGI系统本身是可重现的,否则该论点不成立。借鉴开源AI定义(OSAID)、模型开放框架(MOF)、OpenMDW和确定性推理研究,我们定义了面向AGI的可重现构建的七项要求。我们进一步论证,模型上下文协议(MCP)和类似的AI到AI耦合机制构成了一个新的动态链接层,Copyleft式许可对此并不适用,而Masnick的“协议而非平台”框架提供了更有前景的治理模板。
Copyleft, as implemented in licenses such as the GNU General Public License, was a legal hack that used copyright to guarantee user freedom by tying the availability of source code to every act of distribution. Its normative force rested on an implicit technical premise: that source code and object code stand in a well-defined, humanly auditable, and reproducible relationship. Large language models and, prospectively, Artificial General Intelligence (AGI) systems systematically violate this premise. The artifacts jointly required to reconstruct a model -- code, data, weights, hyperparameters, toolchain, and hardware configuration -- are each subject to independent legal, technical, and economic constraints that no current open-source framework fully resolves. Sufficiently capable AI systems can also rewrite licensed source into functionally equivalent derivatives stripped of their original obligations, a form of laundering against which copyleft has no effective defense. This paper argues that a functional analogue of copyleft for AGI must be grounded not in share-alike clauses over code, but in reproducible builds: a practice guaranteeing bit-exact reconstructability from declared inputs. We review the logic of copyleft, critically examine Maffulli's Second Liberation thesis according to which AI fulfills Stallman's dream, and show that the argument collapses unless AGI systems are themselves reproducible. Drawing on the Open Source AI Definition (OSAID), the Model Openness Framework (MOF), OpenMDW, and deterministic-inference research, we define seven requirements for AGI-oriented reproducible builds. We further argue that the Model Context Protocol (MCP) and analogous AI-to-AI coupling mechanisms constitute a new dynamic linking layer for which copyleft-style licensing is ill-suited, and that Masnick's "protocols, not platforms" framework offers a more promising governance template.