AI: a fork in the road for open source

21 April 2025

Pandora's box is open. Increasing amounts of code running on my computer and in the online services I use will be written by generative AI. By now, this future is inevitable. You and I must grapple with it regardless of whether we particularly like it.

This is not another front in the war of proprietary vs open source. At first glance, proprietary software businesses have more obvious incentives to use GenAI. They save money if they can draw on open source code without having to respect its licence, and within certain bounds they can trade off code quality to get features to market quickly and achieve their version of success.

However, many open source developers will similarly consider success to be "code that works and is available" without any regard to purity. Already pressed for volunteer time, "why not make it faster?" they will reason. Perhaps there is satisfaction in using capital's outputs against it. But even if a particular open source project disavows AI, contributions can function as a status symbol and be a significant factor in hiring, so individual developers have incentives to use GenAI and lie about it when convenient.

Today we have a large amount of actively maintained open source software. The most popular and successful have been written by talented programmers who have no particular need for GenAI to do a great job. With their code being ripped off to drive these algorithms, it's natural that this community is where we see the strongest opposition. This discussion on the Servo repo is a good example.

Unfortunately, Servo ultimately has little power over the situation. There will be more and more pull requests that used GenAI in part or in full, with varying amounts of honesty. Who pays the cost here? Maintainers, as usual. This post by Daniel Stenberg (author of curl) paints a good picture. Random people on the internet can now easily produce technical artifacts that look plausible but are actually garbage, wasting the time of the people who are actually responsible for the thing. Maintainers who are probably already time-poor and burnt out will additionally be forced to sleuth whether each PR is AI-generated and have arguments with those that they accuse.

The labour imbalance is the observable problem. The other problem is how training the models strips the licences from the open source code used as input data. Opinions vary how valid this complaint is. I have an opinion of course, but it's more important to realise that resolving this question no longer particularly matters. It looks like the AI companies are going to get away with strip-mining the entire web and selling derivative works for a monthly fee. Even if legal opinions suddenly changed, freely downloadable models already exist and will now always exist, many of them able to run on a reasonably powerful PC in the privacy of one's home. The situation is effectively irreversible. Whether they tell themselves that their GenAI usage "isn't real infringement" or whether they're pleased that they now have a way to infringe without easily getting caught, from now on there will be a healthy supply of GenAI users who are uninterested in the moral issues.

In a sense nothing has changed. You don't need Claude or ChatGPT in order to plagiarise one open source project and submit it as your own commit to another. There is a significant difference of scale, however. The ease of engaging GenAI, the vast range of projects to pilfer from, and the speed at which one could commit plagiarism in different programming contexts over and over utterly dwarfs what any unscrupulous developer could do under their own steam. It's easier and more effective so we can be sure it will happen more.

Given all this, the big question is what open source projects and maintainers can actually do in this brave new world? The Servo discussion linked above shows a project trying to find consensus, some position that reflects the values of that community. In my opinion these attempts are doomed to fail, not because consensus is a bad idea, but because GenAI is too insidious and the incentives are too perverse. If a project chooses a norm that is anything other than "GenAI is fine, just use it well," it's going to be unenforceable and continuously undermined.

Embrace the AI. This implies comfort with the intellectual property situation. Maintainers can focus on test and verification strategies that assume some code is being generated, with associated risks. Maintainers may potentially use their own GenAI automation to deal with increased volume of PRs or reviewing larger PRs.
Cling to known humans who write their own code. This will be easier for smaller established projects with collaborators who know each other personally. With more work it could be bootstrapped if similarly-minded folks start projects together. Even if contributions from the public are considered for inclusion, hosting on a private forge may reduce targeting from people who are trying to juice up their GitHub profile.
Reject external contributions. This is where I'm currently at. My projects are solo efforts and basically unimportant. I have no interest in spending my free time reviewing something that someone didn't write themselves, no way of knowing whether that's the case, and on principle I don't want my projects' copyright status polluted by who-knows-what models. My personal interest in sharing code is served best by rejecting all external contributions. At the very least I have to accept that my code will be used for training without my permission simply by making it available on the internet.
Write proprietary software instead. Nobody sends you PRs, because they can't, and you might even be able to make a buck along the way.

My prediction is that (1) will become the dominant form of open source, with a long tail of grumpy holdouts who adhere to free software principles. Most businesses use and participate in open source specifically because they want to take advantage of code and contributions without having to pay for it. GenAI simply lubricates the process with the grease of GPL and other projects, at a lower labour cost. People who want to improve their hireability at these same companies will gladly use the tools. For new generations of developers entering the industry, these repos that are AI-positive are what they will see first and possibly be all they know.

For those who wish it were different… there will still be pathways to working on software without AI breathing down our necks, but our choices will be more constrained. Let us try not to exhaust all of our passion tilting at windmills.

Serious Computer Business Blog by Thomas Karpiniec
Posts RSS, Atom