In our previous post in this series, we talked about how AI is beginning to change the way software is developed. In this follow-up, we focus on some of the main legal (or quasi-legal) issues that open source developers themselves have been raising regarding AI-assisted development. 

This isn’t a comprehensive overview of every legal issue connected to AI. We aren’t addressing, for example, customer concerns about compliance with AI regulations or liability issues relating to contracts for AI-powered products. Instead, we’re focusing on issues that are being actively debated inside open source communities. 

Our views on these issues reflect our commitment to responsible use of AI technologies and our “default to open” philosophy. We believe that collaborative and transparent approaches are the best ways to address these concerns constructively.

Attribution and marking 

Attribution is a core legal and cultural norm in open source. Licenses generally require you to preserve copyright and authorship notices, and to avoid misleading claims of authorship. 

AI-assisted development complicates this. Because AI systems are not considered “authors” under copyright law, there is technically no one to credit. But it would still be misleading for a developer to present substantial AI-generated output as purely their own work. 

That’s why a growing number of open source projects are adopting disclosure rules for AI-assisted contributions, drawing inspiration from disclosure norms in other fields, such as labeling synthetic media. “Marking” contributions helps preserve both legal clarity and community trust, and makes it easier for reviewers to evaluate the code in context.

We support marking, but we don’t think it should be overly prescriptive. Relatively trivial uses of AI (like autocompleting a variable name or suggesting a docstring) shouldn’t require disclosure. For more substantial uses, marking can be as simple as a source code comment, a note in a merge request, or a commit trailer such as Assisted-by: (other candidates used by some projects include Generated-by: and Co-authored by: ).  

Copyright and licensing formalities

As important as attribution may be, open source depends even more heavily on clear license grants. This raises a practical question: how should license notices work when a contribution includes noncopyrightable AI-generated material?

In most cases, where license notices already exist in a repository or individual source file, nothing should change. Because of the highly functional nature of code, source files are already generally a mix of copyrightable and noncopyrightable material, and open source license grants apply only to the parts that are copyright-protected. For substantial AI-generated contributions, disclosure through marking complements existing license notices and is the right way to avoid misleading anyone. 

The harder case is when an entire source file, or even an entire repository, is generated by AI. Here, adding a copyright and license notice may be inappropriate unless and until human contributions transform the file into a copyrightable work. But given the norm that open source repositories should have a global LICENSE file, it is reasonable to add a familiar ultra-permissive open source license (for example, the Unlicense) as the global license of an AI-generated repository, even though technically such licenses assume copyright exists. As human contributions are added, maintainers can revisit this initial license choice; because of the lack of previous human contributors, this will be easier than the typical scenario in which an open source project is relicensed. We expect practices to evolve with both changes in the law and greater community experience with AI tools.  

Are AI tools “plagiarism machines”? 

Some open source developers are skeptical, and sometimes even hostile, toward AI-assisted development, accusing AI models of being “plagiarism machines” or “copyright laundering” mechanisms. 

There are two versions of this concern. The first is practical: that an AI tool could covertly insert excerpts of proprietary (or license-incompatible) code into an open source project, potentially creating legal risk for maintainers and users. The second is broader and more philosophical: that large language models, trained on vast amounts of open source software, are essentially misappropriating the community’s work, producing outputs stripped of the obligations that open source licenses require.   

We think these concerns deserve to be taken seriously. It is true that large language models are capable, in some cases, of emitting nontrivial excerpts of their training data. If that were a frequent or unavoidable behavior, it would be a good reason to avoid using these tools altogether. 

But the evidence suggests otherwise. When GitHub Copilot was released, there were widely-publicized claims that its suggestions copied from open source projects. Where those claims were substantiated at all, they typically involved deliberate efforts to coax the tool into reproducing known code verbatim, which is not an ordinary use. Since then, we have not seen credible evidence that widely-used AI development tools systematically replicate portions of training data that are substantial enough to raise copyright concerns.

The misconception underlying much of the “plagiarism machine” narrative is that generative AI models are a kind of lossy compression of their training data. In reality, the normal behavior of models is to generate novel text based on statistical patterns they have learned. The fact that they are trained on open source code does not mean their output is a reproduction of that code. 

That said, the possibility of occasional replication cannot be ignored. Developers using AI tools should remain attentive to this risk, and treat AI-generated output as something to be reviewed with the same care as any other contribution. Where AI development tools provide functionality to detect or flag lengthy suggestions that match existing open source code, those features should be enabled. Combined with disclosure practices and human oversight, these steps are a practical way to mitigate the replication concern without treating all AI use as inherently tainted. 

AI-assisted contributions and the DCO

Projects that use the Developer Certificate of Origin (DCO) have raised particular concerns about AI-assisted contributions. The DCO, which we’ve long recommended as an open source development best practice, requires contributors to certify that they have the right to submit their work under the project’s license. Some developers argue that, because AI tool outputs may include unknown or undisclosed material, no one can legitimately make the DCO signoff for AI-assisted code. This view has led some DCO-using projects to prohibit AI-assisted contributions altogether. 

We understand this concern, but the DCO has never been interpreted to require that every line of a contribution must be the personal creative expression of the contributor or another human developer. Many contributions contain routine, noncopyrightable material, and developers still sign off on them. The real point of the DCO is responsibility. The contributor believes they have the right to use the contribution in a work that is governed (as to its copyrighted elements) by a particular open source license. Project maintainers have the reasonable expectation that the contributor has done some due diligence to make the certification. With disclosure and human attentiveness – and oversight – aided where possible by tools that check for code similarity, AI-assisted contributions can be entirely compatible with the spirit of the DCO.

None of this is to say that projects must allow AI-assisted contributions. Each project is entitled to make its own rules and set its own comfort level, and if a project decides to prohibit AI-assisted contributions for now, that decision deserves respect. Projects opting to take this path should recognize that the concerns they are voicing are not new or unique to AI. For years, risk-averse commercial users of open source worried about “laundered” code: contributions hiding copyrighted material under undisclosed, problematic terms. Over time, those fears proved to be unfounded. It is not impossible that an AI-assisted contribution could contain undisclosed copyrighted material, but experience suggests it is a manageable risk event, and it is not categorically different from the challenges open source has faced and dealt with in the past. 

In other words, the DCO can remain what it has always been: a practical and effective tool for maintaining trust and legal clarity in open source development, even in the age of AI.

Establishing trust

Underlying much of the discussion around AI in software development, whether legal, technical, or ethical, is the question of trust. Trust is a fundamental human concern that is essential to any successful open source project. The introduction of AI into open source development raises new issues of trust across several dimensions: trust that contributors are using AI responsibly, that those who do so are not stigmatized, and that the companies building and encouraging the use of AI are doing so in ways that serve the public good. Acknowledging that these companies, including Red Hat, have a commercial interest in the success of AI is also a critical part of being transparent about their role in this technological transformation.

The challenge of building trust in technology is not new. Ken Thompson’s seminal 1984 lecture “Reflections on Trusting Trust” remains a touchstone for understanding how deeply human judgment and institutional integrity underpin software itself. AI brings these concepts back into sharp relief. Trust must still be earned through consistent and visible actions. Red Hat values the trust we’ve built with upstream communities, and we believe our open source development model, grounded in transparency, collaboration, and accountability, remains the best way to sustain it as we navigate the future of AI and open source together.

Looking ahead

The issues we’ve discussed here – marking, license notices, training data replication concerns, and the DCO – are the kinds of legal questions we find open source developers are wrestling with most today. With disclosure of use of AI, human oversight, and respect for project rules, AI-assisted development can be reconciled with both the legal foundations and the cultural values of open source. We welcome collaboration in upstream projects on those and other approaches that balance those interests. Each project should be free to make its own choices. Open source communities will be stronger if they address these issues themselves, rather than standing aside from them. 

Resource

The adaptable enterprise: Why AI readiness is disruption readiness

This e-book, written by Michael Ferris, Red Hat COO and CSO, navigates the pace of change and technological disruption with AI that faces IT leaders today.

About the authors

Chris Wright is senior vice president and chief technology officer (CTO) at Red Hat. Wright leads the Office of the CTO, which is responsible for incubating emerging technologies and developing forward-looking perspectives on innovations such as artificial intelligence, cloud computing, distributed storage, software defined networking and network functions virtualization, containers, automation and continuous delivery, and distributed ledger.

During his more than 20 years as a software engineer, Wright has worked in the telecommunications industry on high availability and distributed systems, and in the Linux industry on security, virtualization, and networking. He has been a Linux developer for more than 15 years, most of that time spent working deep in the Linux kernel. He is passionate about open source software serving as the foundation for next generation IT systems.

UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds