No one buys books (?)

A post: No one buys books

In 2022, Penguin Random House wanted to buy Simon & Schuster. The two publishing houses made up 37 percent and 11 percent of the market share, according to the filing, and combined they would have condensed the Big Five publishing houses into the Big Four. But the government intervened and brought an antitrust case against Penguin to determine whether that would create a monopoly. 

The judge ultimately ruled that the merger would create a monopoly and blocked the $2.2 billion purchase. But during the trial, the head of every major publishing house and literary agency got up on the stand to speak about the publishing industry and give numbers, giving us an eye-opening account of the industry from the inside.

And that is what the linked post is about.

I think I can sum up what I’ve learned like this: The Big Five publishing houses spend most of their money on book advances for big celebrities like Britney Spears and franchise authors like James Patterson and this is the bulk of their business. They also sell a lot of Bibles, repeat best sellers like Lord of the Rings, and children’s books like The Very Hungry Caterpillar. These two market categories (celebrity books and repeat bestsellers from the backlist) make up the entirety of the publishing industry and even fund their vanity project: publishing all the rest of the books we think about when we think about book publishing (which make no money at all and typically sell less than 1,000 copies).

Bold is mine.

I am actually not surprised. The reason I’m not surprised is that I’ve been looking at bestselling books lately, to look at first pages, and what is on the lists of bestselling SF novels, for example? Lots and lots of classics, such as DUNE, that have been out for a zillion years.

More from the linked post:

In my essay “Writing books isn’t a good idea” I wrote that, in 2020, only 268 titles sold more than 100,000 copies, and 96 percent of books sold less than 1,000 copies. That’s still the vibe.

I’m very surprised that 268 titles sold more than a hundred thousand copies. That’s quite a lot more than I would really have expected. I’m also surprised that such a large percentage sold fewer than 1000 copies. I would have guessed that the mean sales would be at least twice that. But I wouldn’t have guessed that means sales were a lot more than twice that, though.

The DOJ’s lawyer collected data on 58,000 titles published in a year and discovered that 90 percent of them sold fewer than 2,000 copies and 50 percent sold less than a dozen copies.

Fifty percent of traditionally published titles that come out from Big Five publishers sell fewer than a dozen copies? I find this very, very difficult to believe. I immediately downgrade this whole post because that is just impossible. OBVIOUSLY this is impossible. I’m trying to figure out what the author of this post has left out, what kind of context could make this be remotely accurate.

A) The lawyer was also collecting data on self-published titles. But how? Amazon doesn’t share data about sales. That’s why people estimate sales using tools like the calculator at Publisher’s Rocket.

B) Oh, wait, maybe the lawyer collected data on a whole lot of backlist titles, such as, I don’t know, Red Moon and Black Mountain, with a sales rank of approximately 2,394,000, meaning definitely under a dozen sales per year right now. The Publisher’s Rocket calculator won’t say anything more specific than “less than a book per day,” but also suggests that a sales rank of 100,000 equals about one book per day. So, if the lawyer meant that he’d collected data on 58,000 books, a lot of which are not available as ebooks or something, that could do it.

But, this would also mean that “published in a year” doesn’t mean “newly published.” It would have to mean “in print” or “sort of in print” rather than “published this year.” So … maybe this isn’t what the lawyer meant. But then what did he mean? Given that it is totally impossible that half of all books traditionally published in a year sell a dozen copies or fewer.

So, I went fishing, and found this article here: No, Most Books Don’t Sell Only a Dozen Copies

Well, I’m glad someone is both incredulous and willing to explain what is going on.

But publishing statistics are often not what you think. This extreme 12 copies claim joins a couple others that have gone around the internet recently: “98 percent of books sell fewer than 5,000 copies.” “90 percent sell fewer than 2,000 copies” “Most books sell fewer than 99 copies.” Etc.

Are all of these true? None of them? Part of the problem with evaluating claims of “most published books sell [X] copies” is that it—[apologies for the Derrida voice]—it all depends on what you mean by “book,” “published,” and “sell.” No, I’m not playing postmodern games here. It really is confusing.

Here are some of the things that create this completely bogus statistic cited above:

A) Every edition of a book is counted as “a book,” so if there are 135 different editions of Pride and Prejudice, each is counted as “a book” separately.

B) No, “published” does not meant “this year.” Yes, books like Red Moon and Black Mountain are counted.

C) They are probably using BookScan data, which means leaving out ebook sales completely, and also leaving out sales to libraries and various other kinds of sales.

Then this:

In terms of the dozen copies statistic, I can’t evaluate it because it is unclear what it’s referring to. Fifty-eight thousand books is more books than PRH publishes in a given year, but far less than their entire backlist. Is 58k all new books published with an ISBN, including self-published books? Is it something else? I really don’t know and none of the publishing professionals I follow seem to know either. (Editing to add: Jane Friedman, who posted this number originally on Instagram, noted there was no source given in testimony. Friedman gives her own guess in the comments.)

Friedman’s guess is that the numbers come from Bookscan and includes university presses but not ebooks and not sales to libraries. Friedman says that she thinks the “90% of books sell less than 2000 copies” is probably a LOT more accurate. For what it’s worth, I agree. That number seems in line with what people actually say about sales in the real world.

But! You know what is in the comments to this post? A long comment from … ready? … a representative from Bookscan. Here is what she says, which I am pulling out in its entirety below.

I am capitalizing PRINT every time it appears in the quote below because it is JUST CRUCIAL to be aware that these numbers DO NOT INCLUDE EBOOKS. Most but not all of the other all-caps is from the original.


Hey y’all, it’s Kristen McLean, lead industry analyst from NPD BookScan. I thought I would chime in with some numbers here, since that statistic from the DOJ is super-misleading, and I’m not sure where it originally came from, since we did not provide it directly.

It is possible it came from our data, and was provided by one of the publisher parties, but based on the 58,000 figure, it’s not obvious what exactly it includes in terms of “publisher frontlist”. 58,000 titles is way too small a number for “all frontlist books published in a year by every publisher”–that’s more like 487,000 frontlist titles–so it’s clear it’s a slice but I’m not sure HOW it was sliced.

NPD BookScan (BookScan is owned by The NPD Group, not Nielsen, BTW), collects data on PRINT book sales from 16,000 retail locations, including Amazon PRINT book sales. Included in those numbers are any PRINT book sales from self-publishing platforms where the author has opted for extended distribution and a PRINT book was sold by Amazon or another retailer. So that 487K “new book” figure is all frontlist books in our data showing at least 1 unit sale over the last 52 weeks coming from publishers of all sizes, including individuals.

Lots of press outlets have been calling about it today, so I did a little digging to see if I could reverse-engineer the citation, and am happy to share our numbers here for clarity.

Because this is clearly a slice, and most likely provided by one of the parties to the suit, I decided to limit my data to the frontlist sales for the top 10 publishers by unit volume in the U.S. Trade market. My ISBN list is a little smaller than the one quoted in the DOJ, but the principals will be the same.

The data below includes frontlist titles from Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not pubishers they distribute.

Here is what I found. Collectively, 45,571 unique ISBNs appear for these publishers in our frontlist sales data for the last 52 weeks (thru week ending 8-24-2022).

In this dataset:

>>>0.4% or 163 books sold 100,000 copies or more

>>>0.7% or 320 books sold between 50,000-99,999 copies

>>>2.2% or 1,015 books sold between 20,000-49,999 copies

>>>3.4% or 1,572 books sold between 10,000-19,999 copies

>>>5.5% or 2,518 books sold between 5,000-9,999 copies

>>>21.6% or 9,863 books sold between 1,000-4,999 copies

>>>51.4% or 23,419 sold between 12-999 copies

>>>14.7% or 6,701 books sold under 12 copies

So, only about 15% of all of those publisher-produced frontlist books sold less than 12 copies. That’s not nothing, but nowhere as janky as what has been reported.

BUT, I think the real story is that roughly 66% of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks. (Those last two points combined)

And less than 2% sold more than 50,000 copies. (The top two points)

Now data is a funny thing. It can be sliced and diced to create different types of views. For instance we could run the same analysis on ALL of those 487K new books published in the last 52 weeks, which includes many small press and independetly published titles, and we would find that about 98% of them sold less than 5,000 copies in the “trade bookstore market” that NPD BookScan covers. (I know this IS a true statistic because that data was produced by us for The New York Times.)

But that data does not include direct sales from publishers. It does not include sales by authors at events, or through their websites. It DOES NOT include EBOOK SALES which we track in a separate tool, and it doesn’t include any of the amazing reading going on through platforms like Substack, Wattpad, Webtoons, Kindle Direct, or library lending platforms like OverDrive or Hoopla.

BUT, it does represent the general reality of the ECONOMICS of the publishing market. In general, most of the revenue that keeps publishers in business comes from the very narrow band of publishing successes in the top 8-10% of new books, along with the 70% of overall sales that come from BACKLIST books in the current market. (Backlist books have gained about 4% in share from frontlist books since the pandemic began, but that is a whole other story.)

The long and short of it is publishing is very much a gambler’s game, and I think that has been clear from the testimony in the DOJ case. It is true that most people in publishing up to and including the CEOs cannot tell you for sure what books are going to make their year. The big advantage that publisher consolidation has brought to the top of the market is deeper pockets and more resources to roll those dice. More money to get a hot project. More money to influence outcomes through marketing, more access to sales and distribution mechanisms, and easier access to the gatekeepers who decide what books make it onto retailers’ shelves. And better ability to distribute risk across a bigger list of gambles.

It is largely a numbers game and I’m not just saying that because I’m a numbers gal. It’s a tough business.

And there you go, that is much more accurate information about PRINT sales.

So, how about ebook sales? Out of curiosity, I went over to KDP and checked TUYO. This book has been offered free multiple times, maybe six times or so since it came out? Something like that. This inflates the number of ebooks downloaded relative to the print books sold.

When you include the free downloads, 1.2% of the “sales” have been of print books. The rest have been “sales” of ebooks, but a lot have been free downloads, not paid.

So let’s take the free ones out using KDP’s handy tool, and look again. Okay, now 16.7% of actual sales have been print. The rest have been ebooks.

None of that includes KU pages read. So, let’s look at that a different way. What proportion of all royalties have come from KU pages read compared to all sales? Ah, that looks like 63.4%.

What percentage of royalties are represented by print sales? A total of 5.4% of lifetime royalties have come from print sales. The rest of the royalties for this book have come from ebooks OR KU pages read.

This drives a pretty big stake through the idea that the BookScan numbers have a lot to say about the success of the books that occupy the long tail. The most successful books are wildly successful no matter how you slice it, but if print constitutes 15% of sales, then a novel that sells 1000 print copies may well have sold 5600 or so ebook copies — that isn’t so terrible.

I grant, a book that sold 12 print copies may by the same estimation have sold only about 65 ebook copies, and that is really dire. But what if that includes, say, The Uruk World System: The Dynamics of Expansion of Early Mesopotamian Civilization, from the U of Chicago press, priced at $50? Or The Canary Code: A Guide to Neurodiversity, Dignity, and Intersectional Belonging at Work? That one is from Random House and it’s priced at $30. I wonder how many copies either of the above sold? I can see the second one selling a bit better, but I think “The Canary Code” is a weird title for something like this and I can see people who might be interested not noticing this book because of that title.

REGARDLESS, the idea that half of all traditionally published books sell fewer than a dozen copies is just wrong, it’s obviously impossibly wrong, and people ought to have known better than to assume that could possibly be true.

Please Feel Free to Share:


11 thoughts on “No one buys books (?)”

  1. This was a really helpful look at how these numbers get sliced and diced!
    I was excited to see Uruk World System get a mention since I discuss one of the chapters with my students every year! They don’t buy the book, though.

  2. Kathryn, I’m really amused that The Uruk World System is a book anybody here has actually heard of, not to mention using in a class. If anybody raised a hand and said, “Oh, I’ve read The Uruk World System,” I was assuming that would be Craig, since it’s just the kind of thing he’d read. But here we are. If he reads this post, maybe he’ll say whether or not he’s read it.

  3. The Uruk World System is new to me. It *is* the sort of book that I’d likely pick up if I saw it cheap enough, probably at a library book sale, but not spring for a new copy.

    I’d give long odds that sales to libraries (not included in those totals, right?) far outstripped sales to individuals. The Canary Code, probably not — I expect they’re hoping that a few H.R. departments will buy loads of copies, which doesn’t sound impossible on the face of it.

  4. Off topic – Sage Empress is out now, by Sherwood Smith, and it’s available in KU. If any of you liked Tribute and the Pheonix feather series as much as I did, this will be exciting news. I’m off to read it now.

  5. I have not read — or heard of — The Uruk World System. It is the sort of thing I’d buy if I ran into it cheap, though, say at a library book sale. Not for $50 new. I’d give odds most of the sales are to libraries, since I doubt there are a lot of university classes using it.

  6. This was interesting, both as a reader and book buyer, and as someone whose work involves statistics. It can be hard to tell when a data-based argument moves from simplifying for clarity into oversimplifying into meaninglessness. I was just recently providing data from an annual survey to a reporter and going through a discussion internal to my team (before we even got to an external audience) about the numbers and how they were collected and calculated and why they were different from similar data from another source.

  7. dropping in here mostly to remark because I’m pretty sure people here would be interested: new peter beagle book coming out 5/14. I’m Afraid You’ve Got Dragons .

    I thought those statistics were highly suspect.

  8. “dropping in here mostly to remark because I’m pretty sure people here would be interested: new peter beagle book coming out 5/14. I’m Afraid You’ve Got Dragons”

    Ooooh. I don’t like everything Peter Beagle’s written, not by a long shot, but I’m definitely interested in at least reading an excerpt and seeing whether it strikes me as anything like The Last Unicorn. (I adore LU, and quite a few of his short stories, but none of his other novels…) And the Goodreads blurb seems to be hinting at a fairytale-tropes pastiche, which of course is one thing The Last Unicorn did very well indeed. If IAYGD strikes me as being similar to Last Unicorn in tone, or if it makes me laugh in the first few pages, I’m risking it.

    Thank you, Elaine! I probably wouldn’t have known it was coming out! (Hmm. Probably should follow Peter Beagle on Amazon…)

  9. Always an occasion when Beagle brings out a new book! I have liked most of his novels, but the only one I have re-read more than once is TLU.

  10. I’d be very interested to know how library borrowing fits into all this, as someone who borrows books a lot. (And decides which books to buy after I read them.) I get the point about it not affecting the economics much, but wouldn’t it be useful in determining the actual market? Also, how can sales TO libraries not be counted?? And another thing— I don’t think it’s fair to use statistics that cover both fiction and nonfiction together.

    Look at this comment from the OP about the title: “I mean it comparatively to other industries. That people might buy 75,000 copies of one book might be a lot to the publishing industry, but it’s tiny when you think about how many people listen to an album or watch a movie.” How can you really make that comparison? Books are nothing like movies or songs! The time/place/occasion (to kind of but not really misuse this phrase) is not the same!

    Re: books selling only 12 copies, this article on PM was quoted in the comments calling it “a sensational bit of banter from a DOJ lawyer — phrased as a question — that was not recognized as valid by one of the two expert witnesses.”:

  11. You’re very right, Mona — songs are nothing like novels. Thanks for the link! I think this impossible number is SO much easier to under if considered as banter rather than a real statement about sales.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top