Home » Amazon, Author Earnings/Vanity Presses, David Gaughran, Ebooks » Digging Deeper Into Author Earnings

Digging Deeper Into Author Earnings

27 May 2015

From Phoenix Sullivan via David Gaughran:

The Author Earnings team are attempting to do something which hasn’t been done before, and their work can’t be refined and improved unless there is some intelligent criticism of their approach and findings.

Today I’ve invited Phoenix Sullivan to blog on the topic. I’ve known Phoenix for a few years now, and if there’s a smarter person in publishing, I haven’t heard of them.

KBoards regulars will already know that Phoenix understands the inner workings of the Kindle Store better than anyone outside Amazon.

. . . .

I set aside some time recently to dive into the Author Earnings raw data for the May 1, 2015 Report. The irksome thing about the scraped data is how much of the puzzle that is Amazon’s ebook sales is missing and/or open to interpretative analysis. It isn’t the data’s fault or even the fault of the collection method. It’s simply that the data made public is limited, which in turn means a lot of creative interpretation goes into even so simple a task as coming up with the number of ebooks sold in a day. While the raw data itself isn’t changeable, different tools and assumptions applied to the data can yield different results, thereby opening up the analysis to differing interpretations.

My goal was to apply a set of tools and assumptions that update and possibly correct those being used by the Author Earnings team. The environment has changed dramatically in the 15 months since the first report came out, yet the analytical tools, in my opinion, haven’t necessarily kept up with the times. That in itself does not mean the results are wrong, but without a challenge to them, we’ll never know, right?

. . . .

By far the biggest assumptive correction I’ve made is two-fold: The first part is applying a new set of sales:rank calculations to the dataset and the second part is applying calculations to maintain ranks rather than using the multipliers needed to hit a rank. Let’s be clear that these multipliers are observed only, and best guesses across a lot of observations. However, I do believe the multipliers currently being used by AE are 1) outdated, and 2) don’t reflect the actual number of sales happening for the majority of books that are maintaining rank in the store and not seeing huge rank swings on a day-to-day basis.

. . . .

Amazon’s algorithms take historical sales – among other variables, such as velocity – into consideration when calculating rank. The longer a title remains around a given rank, the fewer sales it takes to maintain that rank. Observably, anywhere from 10-50% fewer sales. That means the multipliers for hitting ranks are not good indicators of unit sales numbers for the majority of books in the dataset. Here is my observed chart for average sales to maintain rank, along with the old and new numbers for hitting rank. More work needs to be done to fill in the upper brackets on the maintain side. I used the same numbers from my Sales to Hit chart when I felt I didn’t have enough data points on the Maintain side to chart new numbers in, but the safe assertion is that the Top 500 in my own data is over-reporting by a conservative 10%.

. . . .

Integrating KU into the reporting back in July dialed the difficulty of analyzing the data up into the stratosphere. Unread – and therefore unpaid – borrows influence rank across all titles. There’s no way to know how many borrows eventually become paid reads. And there’s no way to calculate how many units moved on any given title were at full price and how many were borrows, either paid or unpaid. Self-reported numbers suggest the split of paid sales to paid borrows is about 50:50 (which still doesn’t account for the unread borrows that inflate rank), which is what the AE Reports use as well. Using the Maintain chart above, I rejiggered all the numbers. The adjusted royalties may well still be inflated, but are, I think, a closer approximation. The difference for the dataset is a statistically significant 21.4% spread in dollars (or the $400 million difference between $1.81 and $1.42 billion per year):

  • $4,957,365 – original AE result for all earnings
  • $4,848,116 – AE results with the new modeling applied
  • $3,895,691 – my adjusted estimate

and for the KU amounts specifically:

  • $167,687 – AE results for borrows with the new modeling applied
  • $144,201 – my estimate
  • 252,161 – AE estimate for total number of KU units sold/borrowed using the Maintain calculations for Indies + Uncategorized
  • 216,410 – my estimate

. . . .

Since the AE Report looks at aggregated totals over individual sales and positions itself as one factor for authors to consider when deciding which path to publishing to pursue, I decided to see what each book averaged in each publishing path. There are pie charts below, but let’s also use words to be sure the picture is clear either way it’s expressed. If we look at gross sales, we see that the Big 5 had only about 50% of the number of titles available in the dataset than indies had. Big 5 books sold about 78% of the number of books indies sold and made more than twice as much. A lot of that goes into Publisher and Amazon pockets, but what does that really mean? The charts show that indie authors in aggregate earned about 25% more than Big 5 authors. In other words, it took almost 50% more available indie books to earn their authors 25% more than Big 5 authors.

. . . .

From the above, we can say that while market share may have eroded for the Big 5, gross sales plateau’d between Jan and May. Losing market share is not the same as bleeding money. Besides, the ebook market – discrete from the general publishing market – is relatively new. The Big 5 were never part of that market until it became lucrative enough to play in, and only once indies were invited into the market did it start to burgeon. Notbecause of indies, but the timing is inseparable. Big 5 never dominated the market, and a few deviation points here and there doesn’t mean it’s losing the market. And while percentage charts are pretty to look at, they don’t always describe an accurate picture. Ebooks, for instance, have lured a certain percentage of customers away from the used-books market. The Big 5 were not in the used-book market before and their models don’t include that market now.

Link to the rest at Let’s Get Digital and thanks to SFR and several others for the tip.

Amazon, Author Earnings/Vanity Presses, David Gaughran, Ebooks

30 Comments to “Digging Deeper Into Author Earnings”

  1. I’ll admit I didn’t dig too deep here but criticizing conclusions other people are drawing and then drawing one like ‘e-books are eating used paper sales’ without substantiating it doesn’t make me want to do that digging

    • Anecdotally (not a word?), my buddy Carl of Cal’s used books, which has been in business many decades, has seen a drop in paperback sales but his sales of everything else have remained strong.

      Note his % drop is nowhere near the % increase of ebooks. We both theorize the pie has gotten bigger.

      My buddy who owns the comics shop says comics are up despite digital comics become more popular. We both figure that the rise of comics movies has grown the pie substantially, faster than can be taken away by people going digital. Also there’s the collectible aspect of comics that will probably keep print strong, his shop is a collectibles shop with about half the space devoted to comics.

      My board game place is doing mad amounts of business. They’ve taken over 3 doors of a strip mall, 3 storefront sections. Even with video games and home gaming stronger every year they are expanding. And their service, layout and location is abysmal. Yet they still thrive. They provide places for people to gather and play games, especially Magic and various Warhammer type miniature games. Again, the consensus was the pie was expanding, and in this case there’s probably something intrinsically different about face to face gaming that cannot be replaced by online gaming. Also the huge popularity of Magic online has not cut into their Magic sales but rather increased sales and game night turnout.

      Lots of guesswork there but the conclusions seem reasonable.

      • The game shop that I know of is also doing bunches of business. It’s in Ithaca, New York, where Cornell U. is located. Where is yours?

  2. Phoenix’s observations and critiques of AE’s data and methodology have consistently earned our respect — they’ve always been sharp, numerate, data-driven, relevant, and insightful.

    If anyone can appreciate the substantial time commitment that she put into independently analyzing the data here, believe me, it’s me. 🙂 And I’m deeply grateful to her for her willingness to do so and to share her findings.

    This is EXACTLY why we publish the raw data… in the hopes that others will look at it independently from another angle, pressure-check us, tease out more valuable insights, and share them.

    I think Phoenix is spot on with a lot of her points here. When I get a chance, we can examine some of the other points where a little more background or contextual nuance would be helpful, and see if, together, we can add to our collective understanding.

    While I won’t have an opportunity to reply in more depth until later, I’m absolutely thrilled that Phoenix was able to do this and share her findings with us all.

    • [Nods wisely, pretending she knows what Data Guy and Phoenix are talking about.]

    • I think it’s great that she did this, but it doesn’t really seem to contradict what seems to me to be the main conclusion of the AE reports. Which as I see it is:

      1. You’re more likely to make more money self-publishing than being traditionally published.

      2. Indy ebook sales are rising, both in total numbers and market share compared to traditional.

      What she mostly disagrees with is by how much. She also, correctly I would assume, says that ebook market share isn’t much of a concern for traditional publishers, who have other agendas. But that doesn’t seem to really dispute the AE reports conclusions or purpose.

      The two most interesting things she brought up is that Kindle Unlimited skews numbers all over the place because of ghost borrows, so it’s likely total sales are inflated. But no way to know by how much.

      And, very interesting, writers signed to Amazon Publishing, are making a ton more money than either traditional or indy writers. That was fascinating and terrific info. So if Amazon offers you a book deal, seriously consider it.

      • The thing about KU “ghost borrows” that people most often ignore is that many of those ghost borrows — most likely the vast majority of them — also result in “ghost revenue” a few days later, without providing any second bump to that book’s ranking, when the borrower actually reads them.

        Whenever I pull a book out of KU/Select, for example, “ghost revenue” from KU/KOLL continues to roll in daily for weeks afterward, before it finally tails off and dies out. I don’t get additional “ranking credit” for any of that revenue.

        So the vast majority of observable “ghost borrows” most likely result in no artificial boost in rankings at all on average, but only cause a floating delay between the time of the ranking bump and the time revenue actually appears in the dashboard.

        As both Phoenix and AE have separately shown, the size of the monthly KU “pot” indicates that the true number of KU borrows is reasonably close to our predictions. (The 20%-ish delta in her estimate is my fault: she used an inaccurate number (55%) from our graph titles that was a last-minute eyeball guess of mine. But when we sum up the unit sales in the spreadsheet, the true percentage lands closer to 63%, which would shift Phoenix’s calculated comparison with the KU payout much more closely into line with ours.

      • I don’t think Phoenix’s goal was to contradict anything in the AE reports. I think her goal was to further refine the knowledge we can wring from the AE data. That’s always useful.

        Ghost borrows are such a tricky thing. They give us a ranking boost now, but maybe not boosted revenue ever. If they do give us boosted revenue later, we won’t get a commensurate ranking boost at that time. Some of my books have been out of KU for more than six months now, yet I still sometimes get payouts from KU on those titles. But the rankings on those books remain “meh” unless I invest in promotion.

    • And that kind of response is the difference between a scientific approach and an ideological one!

    • We’ll look forward to seeing further discussion, Data Guy.

      The more, the better, I think.

  3. No one told me there’d be math!!!

  4. I may be misreading the numbers but the increased sales needed to hit and hold the middle rankings suggests more titles are achieving the previous level which to me means a pie that is growing and is growing for the “middle class”.

    Seems like the market is becoming just a bit less top heavy.

  5. One thing remains unchanged: When you’re self-publishing, you’re responsible for everything that can lead to success.

    This can be a daunting task if you don’t have the skills or don’t want to take the time to learn them. But if you think you know what you’re doing, and recognize where to hire help, it can be liberating.

  6. Out of curiosity, I took the old and new “To Hit” matrices, and used them to fit power laws to the data (just a quickie analysis in Excel). The result was that the best-fit power law to the original matrix had an R-Square of .9635, while the new one had an R-Square of .8932.

    In layman’s language, this means that the original matix had a better fit to a power law, which is the functional form that such data usually conforms to. Now, there is no necessary reason that observed data fit to a theoretical functional form, but in most cases it turns out to be approximately true. Of course, if Amazon’s sales rank to sales is a blend of various sales measures (daily, weekly, etc) then we might expect the power law fit to be degraded somewhat.

    If anything, I tend to think that the Author Earnings matrix might underestimate sales at the very top – the matrix departs from the power-law near the top of the range. But that can happen with real data fits to power law forms, so my quibble isn’t all that strong.

    Anyway, it is interesting stuff. Only Amazon knows for sure.

    • Slight error – I typed a number wrong in one column of the second relationship.

      R-Square of power law fit for original relationship = .9635
      R-Square of power law fit for new relationship = .9503

      So, the fit to power laws is pretty close, between the two. However, once you take logs, the graph of the second relationship looks to depart from a straight line a bit more than the first, a bit curvilinear.

  7. I’m not a math whiz, but based on my new release sales, it takes more than 25 or 30 to hit 5,500 ranking. Mine sold more than triple/not quite triple (based on those two estimates), and just squeaked in under 6k in rank its first day out.

    I know it’s hard to estimate, since the ranking algorithms don’t necessarily update hourly. Sometimes, they update every 2 hours, or they can update hourly, or every 4 hours.

    I’m fascinated by the whole thing, and have tracked the heck out of my sales/sales ranks on Amazon since my series began to sell decently. I’m just not that great at analyzing the data I’ve collected, though I try. =)

    But I also note that 12 sales a day doesn’t keep a book at a certain rank. My series books maintain relatively steady sales after the release “blush” is over, but they also steadily drop in ranking.

  8. I really only consider one important statistic. There are far, far, far more people out there buying ebooks than there are people actually buying MY ebooks. That’s an imbalance that translates into an opportunity and it’s the one I’m trying to fix. Not very well, but I’m trying…

Sorry, the comment form is closed at this time.