Skip to content

AI slop killed the open-source bug bounty

By Ritabrata Maiti · · 7 min read

Play

Turso, the SQLite-compatible database written in Rust, retired its bug bounty program this week. The post explaining why is titled, dryly, The wonders of AI. For about a year the project had been paying $1,000 for each critical vulnerability someone reported in the codebase. Budget wasn’t the problem. The problem was that most of the reports landing in the maintainer’s inbox were written by an LLM, and the maintainer was spending most of his bounty hours reading text that no human had really intended for him to read.

The examples are funny in a bleak way. One submission claimed to have found a critical flaw that let an attacker execute arbitrary SQL statements. Against a SQL database. Another report on the same project described a buffer overflow whose reproduction steps included editing Turso’s own source code, recompiling with a forced volatile write past the end of a vector, and running the modified binary. The “vulnerable code paths” did not exist. The “exploits” did not reproduce.

So the maintainers ended the program. That is a small story about one database, and also a very large story about every queue on the internet that involves humans on one side and an open submission form on the other. Every maintainer who has read Turso’s post this week has recognised their own inbox.

The economics of a bug bounty are tidy. The project says: if you find a real problem that meets this severity bar, we pay you $X. Both sides win. The hunter gets paid for genuinely useful work. The project gets a security audit it could not have afforded any other way.

That contract rested on a quiet assumption. The cost of submitting a report was supposed to be in the same neighbourhood as the cost of producing one. Producing a real exploit is hard. Writing one up takes hours. So submission volume self-throttled. The queue stayed roughly the size of the real-findings rate, which is the rate a small team can actually triage.

Cheap inference broke that assumption in about eighteen months.

It now costs roughly a tenth of a cent to ask a frontier model to produce a vulnerability report against a public codebase. The model will produce one. The report will look real enough that a human has to spend about ten minutes reading it before concluding that it isn’t. Submission is effectively free. Triage costs the most expensive thing a software project has, which is the unbroken attention of the one person who understands the code.

You don’t need many people running that arbitrage to break a small program. Once the submission rate climbs past the triage rate, the queue does not stabilise. It grows until the maintainer either gives up reading or gives up the program. Turso gave up the program.

The bug bounty story is just one shape of a pattern that is now visible in a dozen places.

Drive-by pull requests are flooding popular GitHub repos. Most are LLM-generated typo fixes or hallucinated refactors. Maintainers either auto-close everything and lose the legitimate ones, or burn out on triage. Open issue trackers are seeing the same flood of “I think there might be a bug” reports with no reproduction, often filed by a chatbot middle layer that promised a confused user it would “let the developers know.” Code review comments are now arriving thirty at a time on small PRs, mostly tautologies and hedges. The comment threads on technical blog posts are heading the same way.

The shape is always the same. Producing the input costs nothing. Processing the input still costs a person.

We have automated finding bugs. We have automated submitting bugs. This year we are automating rejecting bugs. Nobody is automating fixing them.

A lot of the early excitement about LLMs in software was that they could write code very fast. That turned out to be true, and it turned out to be the less interesting half of the problem. Writing the code was never the bottleneck. Reading the code is the bottleneck. Reviewing the code, understanding the code, deciding whether to merge the code, deciding whether to ship the code, deciding what to do when the code breaks at 2am. The reading-and-deciding loop is where engineering actually happens, and it has always been bounded by how many things a human can hold in their head at once.

LLMs add to the writing side of that loop and add to the input side of every queue feeding into it. They don’t move the reading-and-deciding bottleneck. They make it the limiting factor for almost everything.

This is why “just put an AI in the triage loop” doesn’t fix the Turso problem. If there’s a model deciding which reports a human sees, you’ve created a new game: produce reports that pass the model. That game is also cheap to play. The judgment step that the attacker is trying to commodify is the exact step that has to stay with a person, because that’s what was being attacked in the first place.

A handful of patterns are starting to circulate in maintainer conversations. None are clean.

A refundable submission deposit is the most-discussed option. You put down a small amount, say $20. You get it back if the report holds up. You lose it if it’s slop. The economics work. The downside is that it filters out the hobbyist newcomer who doesn’t have a card on file, which is exactly the population the open programs were designed to find.

Reputation gating goes the other way. First-time submitters have to clear a higher bar, including a full reproduction and a vouching introduction. Established hunters skip the bar. This rebuilds an apprenticeship model that hasn’t really existed in software security for a generation, but it raises the cost of discovering new talent.

A few projects have started running honeypots. The clearest example is UnsafeLabs/Bounty-Hunters, which publishes a bounty-shaped repo that exists mainly to attract automated scanners. The submissions feed a public leaderboard of automated submitters. You can read it as petty, or you can read it as the first attempt at something a search-engine spam team would recognise: a shared blocklist for the actor side of an asymmetric inbox problem.

Some teams are moving programs off public surfaces entirely and onto verified-identity platforms like HackerOne or Bugcrowd, where the platform handles attribution. Small projects can’t usually afford the platform fees. Bigger ones lose the discovery effect of an open program.

The most interesting idea I’ve seen circulating among maintainers is what someone called proof of code. Before you can submit a bounty report, you have to land a small, accepted PR somewhere in the project. The PR doesn’t have to be security-related. It just has to be real engineering by a person who understood the surrounding code. That converts a cheap text-generation game into a more expensive engineering game. It doesn’t solve the problem. It tilts the asymmetry the other way.

The same dynamic is coming for everything that depends on an open queue with humans on the receiving end. Customer support tickets. Conference paper submissions. Open peer review. Public consultation periods on regulation. Government FOIA requests. Hiring inboxes. The comment thread under this post, probably.

The economic concept is older than the internet. It is the tragedy of the commons applied to attention instead of grazing land. What’s new is the rate at which the cost of producing input is falling. Most of the institutions that grew up around the old cost curve don’t have time to redesign before the next drop.

The next decade of public-facing system design is going to be about adding friction back in on purpose. Identity. Deposits. Reputation. Proof of work. Vouching. We spent twenty years stripping friction out of every form on the internet because friction was the enemy of growth. Now friction is the only thing protecting the people on the receiving end of those forms. The companies that figure out how to add it without losing the magic are going to eat everyone else.

There are two things people now mean by “AI agent,” and they are nearly opposite.

One is the slop pipeline that broke Turso. Cheap inference pointed at any open queue, with no human in the loop, generating volume that other humans then have to filter. The person whose name is on the submission usually isn’t watching what’s submitted. Often they don’t even know what was submitted.

The other is the tool a person uses to do their actual work. An agent that drives the browser tab they’re sitting in front of, that reads the codebase they’re working on, that runs the command they would have typed. The person is at the keyboard. They see every action. Nothing gets sent to anyone else’s queue without them watching it happen.

These two things share a phrase and almost nothing else. The first one is a tax on every public surface on the internet. The second one is a power tool. The fact that they have the same name in the press is going to cost the second category a lot of goodwill before things sort themselves out.

I work on the second kind. Browy is an open-source AI agent that lives in a Chrome side panel and a DevTools REPL. It drives your real browser tabs through chat. There is no Browy server in the loop. There is no inbox you can flood by talking to it. There is a person, you, watching every click and every form fill it makes. Whatever happens to the public commons over the next few years, the part of the internet that’s a tool you operate yourself is still going to be a good place to live.

The 30-second video version of this argument is at the top of this post. Turso’s own write-up is here.