
It’s clear that after Black’s second move, the next couple moves in the opening phase are almost set in stone. Interestingly, Black’s second move seems to be a very critical crossroads of how the game will play out. Nf3 is by far the most played as White, aside from some respectable chunks from Alapin and Closed Sicilian. Of course, among e4 openings, the beloved (and my favorite reply as Black) Sicilian (c5) is the most common reply, even more than e5. I will focus on some of my favorite openings and an insight I found to be quite interesting.

I would have expected the gap between these two most-popular moves to be smaller, but given the time range of this database, it seems reasonable. Right from the first move, I see that e4 dominates by a large margin (48%), followed by d4 (34%). There are also some negligible amount of unfinished games, and I couldn’t discern what happened to those. It’s not very surprising, although I would have expected the difference to be less. White wins a bit more often than Black, 39% to 30%, with the rest being draws.
#CHESS GAMES SOFTWARE#
I also wanted to show off some of the software I wrote for this purpose, checkout the last section for links to those. So I think it’s a pretty good representation of chess games all around. It was the biggest collection of games I could find, spanning games from 1801 up to 2013, and players with ratings between 215 (wow!) to 2861 (I wonder who that is?).

I was interested to see what kind of visualizations I can do, and what patterns would be revealed by considering so many games. I ignored any Chess960 games contained, but in total there are 2,197,113 games.

We’ll take a look at more than 2 million games, taken from the MillionBase PGN database. I wanted to do something like this for a long time, and finally I think it’s at a point where I can release this into the wild.
