




An experiment
(with questionable results)
Slop is all about human attention. LLM-generated code is slop when no person owns it, understands it, and has verified it works.
If we can quantify how much attention a given piece of software needs (attention cost), and measure how much attention it received (attention spent), we can calculate how "sloppy" it is.





We use a project's git history and GitHub activity to estimate the two attention quantities above. For cost, we look at the codebase historical size. For attention, we look at "signals of human interaction" like commits and PR comments.
A week-by-week slop score is then calculated from the estimates.
Weeks that see significant amounts of code added, with disproportionately little human activity, increase the sloppiness of the project. Weeks with high human activity and few (or negative) code additions reduce it.
Unreliable, unfortunately.
For many repos they're plausible, but for just as many they're clearly incorrect.
Accuracy depends heavily on having enough human interaction signals: a feature developed behind closed doors and then code-dropped all at once comes with very few signals attached. To the algorithm it looks indistinguishable from a ladelful of steaming LLM slop.
Other factors can also throw off the estimates. For example, vendored-in dependencies kept in non-standard folders and large code files for configurations or demo data.
The algorithm does try to account for some of these exceptions, but it seems that, ultimately, the two measures we're using are just too indirect.
Hop over to GitHub, open a PR, and you'll get (pending approval) a preview environment where you can test your theories.
Visit pscanf's blog for a more detailed explanation and analysis of the experiment.