A recently discovered hole in Valve’s API allowed observers to generate extremely precise and publicly accessible data for the total number of players for thousands of Steam games. While Valve has now closed this inadvertent data leak, Ars can still provide the data it revealed as a historical record of the aggregate popularity of a large portion of the Steam library.
The new data derivation method, as ably explained in a Medium post from The End Is Nigh developer Tyler Glaiel, centers on the percentage of players who have accomplished developer-defined Achievements associated with many games on the service. On the Steam web site, that data appears rounded to two decimal places. In the Steam API, however, the Achievement percentages were, until recently, provided to an extremely precise 16 decimal places.
This added precision means that many Achievement percentages can only be factored into specific whole numbers. (This is useful since each game’s player count must be a whole number.) With multiple Achievements to check against, it’s possible to find a common denominator that works for all the percentages with high reliability. This process allows for extremely accurate reverse engineering of the denominator representing the total player base for an Achievement percentage.
As Glaiel points out, for instance, an Achievement earned by 0.012782207690179348 percent of players on his game translates precisely to 8 players out of 62,587 without any rounding necessary (once some vagaries of floating point representation are ironed out).
Because this data is derived directly from Steam’s API for each game, it ends up much more precise than the old Steam Gauge/Steam Spy estimation methods, which relied on random sampling of a small portion of the Steam player base. But this method only works for games with developer-defined Achievements, so it covers about 13,000 of the roughly 23,000 games now on Steam.
It’s not exactly clear how Valve defines this “Achievement denominator,” which approaches but doesn’t precisely match up with the “players” statistics provided to individual developers. The new data also gives no indication of how many people own the game without having played it. And, in very rare cases, this method could come up with a denominator that’s off by a factor of two, thanks to common factors (though this chance becomes vanishingly small in games with more than a few Achievements).
By July 4, Valve updated its Steam API to provide much less precision in its Achievement percentages, cutting off this new data source altogether. That move comes just months after Valve started protecting individual Steam usage data by default, cutting off the previous estimation method used by Steam Gauge and Steam Spy. Valve Head of Business Development Jan-Peter Ewert said the company is currently working on a “more accurate” way for users and developers to “get data out of Steam,” though apparently this kind of Achievement-derived data set wasn’t what he had in mind.
Before the Achievement data hole could be plugged, Sergey Galyonkin was able to integrate the method into the machine learning algorithm used for Steam Spy, where the data was briefly displayed on individual game pages earlier in the week. As a public service, and with Galyonkin’s permission, we’re able to share the Achievement-dervied player numbers he collected in this handy CSV file. (Ars was able to confirm the reliability of this data through API-based spot checks earlier in the week). The top 1,000 games by player numbers are also listed on the following pages, for convenience.
This snapshot, accurate as of July 1, will surely grow less useful as time goes on, and it isn’t useful at all for the significant portion of the Steam library that don’t use Achievements. (Such games aren’t included in the data set.) Despite that, and the other caveats listed above, we’re happy to share what is probably the most robust and precise data currently available regarding the relative popularity of a large proportion of the Steam library.
Besides satisfying the curiosity of fans, this kind of data can help increase our understanding of the shape of the PC games market. This is the kind of data that other entertainment industries take for granted—in the form of regular reports on box office receipts and TV ratings, for instance—but which remains frustratingly opaque for the game industry at large.
As Valve’s Ewert told developers recently, “The only way we make money is if you make good decisions in bringing the right games to the platform and finding your audience.” For now, at least, this data leak can help us all better understand many of the games that are “finding an audience” of Steam players.