Crawling the World of Warcraft Armory
I play World of Warcraft, and I’ve always been fascinated by the meta-universe that exists outside the game, on the web. Blizzard’s launch of the World of Warcraft Armory makes available all of the character information for everyone that plays the game. Every character’s class, race, gender, equipment, skills, talents, statistics, faction reputation, guild, and more is available to anyone on the web in an easily searchable form.
Zyph of Maelstrom recently posted some interesting results of mining the Armory to the World of Warcraft forums. He pulled the talent specs for around 6000 level 70 characters from the Armory to see what patterns he could find. WoW Insider has a good write up here.
This inspired me to do the same thing on a larger scale.
My crawler is still running, but at this time, I have data collected on 23603 individual characters from 245 guilds spread across 134 realms. Of those 23603 characters, 7761 are level 70. Here are some initial results from that data set.
This shows the distribution of levels. Note that the Armory doesn’t store data for characters under level 10. No big surprises here.

This shows the male vs female character mix. Clearly players prefer male personas. Not surprising given the majority of players are probably male.

This shows the Alliance vs Horde mix. I was expecting this to be skewed towards Alliance, as this has been the historical trend. Either my population is skewed towards Horde, or more players have been rolling Horde characters since the expansion. The next graph gives some evidence for the latter, as Blood Elves are a significant percentage overall.

This shows the race distribution across all levels. As I mentioned, Blood Elves make a strong appearance at 11.2%, the most popular race after Humans (16.3%), Night Elves (16.3%), and Undead (14.3%). I’m surprised that Undead is so popular, and I was expecting Humans to significantly beat out Night Elves, but they are about equal in my sample. Blood Elves at 11.2% are far more popular than Draenei at 5.9%. In fact, Draenei only just beat out Dwarves at 5.8% to be the second least popular race.

This shows the race distribution, restricted to level 70 characters. Draenei and Blood Elves drop significantly since they were introduced in the Burning Crusade. Interestingly, at level 70, Humans, Night Elves and Undead move further ahead in popularity relative to the other races.

This shows the class distribution across all levels. Biggest surprise is that in my sample, Warriors are almost as popular as Hunters. Druids and Shamans come in as least popular.

This shows the class distribution, restricted to level 70 characters. At level 70, Warriors get a bump over Hunters, becoming the most popular class in my sample. Priests make up some ground to become the third most popular class. Both of these are likely artifacts of the need for Warriors and Priests to level first in a guild as they are needed for all instance runs.

This shows the class distribution, comparing the mix for Horde and Alliance, for all levels. Warriors, Warlocks, and Shamans are significantly more popular for Horde players, while Druids and Paladins are significantly more popular for Alliance players. Of course, the Horde Shaman and Alliance Paladin bias is expected.

This shows the class distribution, comparing the mix for Horde and Alliance, restricted to level 70 characters. This looks pretty similar to the previous graph, except, by restricting to level 70, the number of Horde Paladins and Alliance Shamans drop significantly. The surprise here is how much less popular Alliance Shamans are compared to Horde Paladins. It seems the Horde have embraced the Paladin, but the Alliance have not had a similar reaction to the Shaman.

Over the next few posts I will continue to present any interesting statistics and patterns that I find.
Technical Details (for those that are interested)
Generating a random sample
The Armory doesn’t allow for partial searches, so it isn’t really possible to get a random sample of characters from the Armory directly. Zyph used character profiles from Allakhazam and selected randomly from the set of profiles he pulled. Those profiles gave him a name and realm to look up in the Armory. There is a bias to the selection here, as Allakhazam fetches data from players that install a Windows add-on that collects data from the WoW client. However, Allakhazam does have profiles for more than 2M players, so it’s not clear that this bias is really an issue.
I decided to try something different, which may have more bias, but is an interesting player population. Blizzard’s World of Warcraft forums have a character name and realm associated with every post. By crawling random posts on the WoW forums, I can generate a set of characters to look up in the Armory. At the same time that I look up a character from a forum post, I also look up all of the other characters in their guild. Because of this, my sample contains complete guilds, which means I can compute interesting statistics about guilds as well as individual characters.
Is this population biased? Definitely. Players that post in the forums represent a skewed population. Those players are more actively invested in the game - I mean that in both a good and a bad way. I’m hoping that the addition of every character’s entire guild to the sample will help reduce that bias somewhat.
Extracting the sample and generating statistics
Queries against the Armory return XML that is formatted by an XSLT stylesheet. This makes it really easily to extract the data from the Armory since you can get XML directly instead of having to parse HTML. For each character, I fetch every page from the Armory (character sheet, reputation, skills, talents, and guild) and then use a set of XPath queries to extract every field. My current program extracts around 200 fields for each character. Each character is then stored as a single record in an Apache Derby database, which allows me to write SQL to directly extract statistics. Apple’s Keynote presentation software was used to draw the graphs.
