In July and August this year, we were seeing a lot of posts like this:
Recently, an indie author, Caitlyn Lynch, tweeted about noticing that only 19 of the best sellers in the Teen & Young Adult Contemporary Romance eBooks top 100 chart on Amazon were real, legit books. The rest were nonsensical and incoherent, and seemingly AI-generated. …
The Motherload website later looked into dozens of books on the platform and saw that a few days after Lynch’s tweets, the AI books had vanished from the best-seller lists, probably removed by Amazon.
They were, however, still available for purchase, and had enjoyed a significant amount of visibility before vanishing. Also, as Lynch very understandably speculates, the mass uploading of AI-generated books could be used to facilitate click-farming, where ‘bots’ click through a book automatically, generating royalties from Amazon Kindle Unlimited, which pays authors by the amount of pages that are read in an ebook. So, it doesn’t matter that these books disappear. The people running such a scheme could just upload as many as they like to replace the removed ones.
This is obviously a problem, potentially a much bigger problem than I thought it would be. The obvious solution: Amazon needs to crush every fake AI-generated pseudobook like a bug and nuke the people uploading those books. I mean a ban-for-life, the way they do to identified scammers, which is what these people are.
I’ve seen a number of opinion pieces that declare that Amazon won’t do that because they don’t care about garbage pseudobooks as long as they’re making money. This is probably — I want to say obviously, but I’m not sure I’d go that far — wrong. I’m pretty sure about that. It’s wrong because Amazon is all about presenting readers with books that will make them happy, and wading through mountains of garbage pseudobooks does not make people happy. It makes them mad.
The problem, it seems to me, is that it’s hard to identify AI generated garbage.
The solution, it seems to me, is to get a lot better at identifying AI generated garbage as fast as possible, crush fake books like bugs, and nuke from orbit the people who are loading them.
Amazon is (as far as I’ve heard) very, very willing to delete your account and ban you for life if you try to cheat in ways they have decided matter. Once they nuke your account, you are done at KDP, because (as far as I’ve heard), they don’t give a lot of second chances. (They are apparently perfectly fine with scammers using various other methods they haven’t yet decided to care about.) (No, that is not ideal.)
While I guess this situation could play out in various ways, I will just note that Amazon KDP suddenly has a brand-new button on the “content” page at KDP. “Is any part of your book generated by AI?” asks the button. “Click yes or no.”
While there is no “Because we’re going to crush your fake book like a bug” notification, it’s pretty obvious that KDP will soon be able to exercise various options:
A) You check “Yes” to that question. They let you upload your fake book, but they drop it into a dungeon along with almost all low-content books. No one ever sees it. It’s not presented to readers in KU. Problem solved.
B) You check “No” to that question. They run the text of your book through an AI detector they are currently beta-testing and, if it fails, they give you one chance to explain why and then they crush your book like a bug and nuke you for lying to them.
And, basically, I would be fine with that. I would hope not many real authors will get caught if and when Amazon brings down an enormous hammer, but for crying out loud, 4/5 of all the top books are fake? That hammer needs to come down hard, as soon as possible.
Question: how good are AI detectors at this point?
I’ve heard they’re not great, at least the ones available free to whoever wants to poke at them, but the only one I tested identified my text as close to 100% human-generated. The one sentence that got flagged in the report I was writing was “See figure 1, below,” which I thought was funny. I still think that’s funny, and it also indicates that most text is going to come back less than 100% human generated because certain standard word combinations are just that — standard.
On the other hand, it ought to be possible to just draw a fairly arbitrary line, like “If more than 15% of the text comes back AI generated, it’s a fake book.” Or 25% or for that matter 70%, or anything that will let a detector reliably distinguish garbage pseudobooks from real books without generating too many false positives.
Here’s a detector: Copyleaks AI Content Detector. According to this detector, the first chapter of INVICTUS is human-generated. Or rather, there’s a 96.2% probability it’s human generated. Text of an email I wrote recently: 80.2% probability it’s human generated. I note that the conclusion just says “This is human text,” so 80% is good enough for this detector. It’s a short email; that may be why the probability dropped like that. It’s got fairly standard content; that may be why. A piece of a report I was working on a while ago: 69.5% probability it’s human generated! What a huge drop! That’s really interesting! You know what it still says? “This is human text.” I wonder how far the probability has to drop before it says it’s not human text?
I googled “text generator ai, found this site, and entered as a prompt: 1000 word story about a girl and a black dog. This is what the generator gave me:
Julia had always been afraid of black dogs, but when she met one on her way home from school one day, she discovered that this dog was different. The dog, who she named Midnight, seemed to understand her fears and gently led her home. A bond quickly formed between Julia and Midnight, and the dog became her loyal companion and protector. Over time, Julia’s fear of black dogs faded away, and she learned that sometimes the things we fear the most can turn out to be our greatest blessings.
This isn’t anything like a thousand words, obviously, so I guess the generator ignores that part of a request. Whatever, I picked up this paragraph and dropped it in the AI detector linked above. Boom! 98% probability it’s AI generated. I tried a couple other prompts and got similar results.
Here’s another text generator. It says to enter a prompt and hit go. I entered “Once a girl met a black dog” and got this:
Once a girl met a black dog in these circumstances she would just get up and leave, unless of course that dog was the source of the problem and was aggressive. But in that case, I would have to deal with it and our dogs would have to get used to each other. It was a little hard for my young dogs to understand that most of the time the dogs were the problem, not the owner. My birds were at home. The bird room is small and full of birds. Even the floor was littered with feathers and bird toys. It was the first time I’d had this many birds at once, and I didn’t know how to deal with it. My attempts to have them in groups were less than successful. Some kept wanting to be with other birds. Some would try to kill others. Some were fine with one bird in the cage and that bird would be perfectly happy, but as soon as another bird joined the group, then the others in the group would hiss and fight and peck each other. All my birds, even my sister’s ex – therapy bird, were afraid of the floor.
This is fascinating! I sure did not see the birds coming! I dropped this into a different AI detector and found that it is “8% human generated.” I’m told I should edit this text until there’s less detectable AI content. The first paragraphs of INVICTUS? 100% human generated.
Out of curiosity, I dropped the bird thing into Grammarly’s plagiarism detector, which thought it was just peachy. I mean, it said there were errors, but it didn’t say it was AI generated or plagiarized.
The first chapter of INVICTUS? Grammarly’s plagiarism says “significant plagiarism found.” Really? The book isn’t out yet! How can it be finding plagiarism? I’m really startled! It also finds a whole bunch of issues with conciseness, word choices, grammar, and punctuation. I’m not paying for any kind of report, but seriously? I did not have a high opinion of Grammarly prior to this moment, but now my opinion is much, much lower. In fact, all of a sudden I’m wondering if Grammarly is deliberately lying in order to get people to buy it in order to find out what part of their essay or book or whatever looks like it is plagiarized. Suddenly that seems like a plausible scenario!
On the other hand, this ten-minute test of AI detectors seems to suggest that they’re maybe, kind of, pretty much, good at detecting AI-generated text? I hear they aren’t reliable, but whenever I poke at them, they seem pretty good at it. I think it’s reasonable to get a score of 69% human generated and declare it’s human generated enough. That kind of conclusion seems likely to help prevent too many false positives.
I think AI detection is going to get pretty reliable, I think people are probably working on that, and of course AI generation will get more subtle, but it’s not like “AI text generators” actually have brains or intelligence. I suspect detectors will get out in front and stay there for a bit. And … I hope I’m not too optimistic or pollyanna-ish, but I think it’s pretty likely Amazon is currently working on a detector and will pretty soon bring a giant hammer down on fake garbage pseudobooks. I hope I’m right about that, and that unanticipated side effects aren’t as dire as the problem that solves.
Please Feel Free to Share: