graviti
产品服务
解决方案
知识库
公开数据集
关于我们
avatar
3.5 Million Chess Games
Text Detection
Text Detection
|...
许可协议: CC-BY-SA 4.0

Overview

Context

ChessDB is a free chess database. Chess databases in general provide: (1) A convenient way to store your own games in a way where the right game can be quickly found. They can be sorted by a number of different criteria, and searched for quickly; (2) A very time-efficient method to study, which will give the maximum improvement in the minimum time. The features which help you study vary from database to database. The specific features of ChessDB are given further down this web page; and (3) Access to statistics about both your own games and of GMs, which would be impractical to collect without a computer based chess database.

Fore more detail: http://chessdb.sourceforge.net/

Content

http://chess-research-project.readthedocs.io/en/latest/

In the original dataset there are around 3.5 million games. If you study the cumulative number of games played as a function of time (following plot).
enter image description here
you can see that the oldest game in the database is from the year 1783 (a blindfold by Philidor). After that, several periods can be appreciated, roughly. The first period consisting in old games where the data is very sparse. The second period, starting at the end of 1800, is when chess became a popular game. A third period occurs around 1960 during the cold war, when chess became a serious matter; i.e. it became a highly competitive and profesional sport. Finally, a fourth period starts around 1998 with the masification of the Internet. This last period contains most of the games and is the most consistent one. After filtered out games without a complete date (day,month,year), around 1.4 million games remains.

After the comments and description of the columns, each line corresponds to one and only one game. The first columns describe attributes of the game, such as the date in which it was played, the name of the players, etc. The last columns, from 17 onwards after the token ###, contains the sequence of the game moves. Let us provide a description for the columns of the game attributes: Position of the game in the original PGN file.
Date at which the game was played (the format is year.month.day).
Game result specified inside brackets in the PGN file. The value can be 1, 0 or -1 corresponding to white win, draw or loose, respectively.
ELO of withe player (an integer number).
ELO of black player (an integer number).
Number of moves in the game (for some games it may be zero!)
date_c = date (in year.month.day) is corrupted or missing? the label should be date_true, meaning the date is corrupted, or date_false, meaning the date is NOT corrupted. The same logic applies to the following attributes ending in “_c” (i.e. _corrupted).
resu_c = result (1-0, 1/2-1/2, or 0-1) is corrupted or missing?
welo_c = withe ELO is corrupted or missing?
belo_c = black ELO is corrupted or missing?
edate_c = event date is corrupted or missing? The event where the game was held (if there is one).
setup may be setup_true or setup_false. If it is true then the game initial position is specified. This is used when playing Fischer Random Chess for example.
fen may be fen_true and fen_false. It is related to column 12.
In the original file the result is provided in two places. At the end of each sequence of moves and in the attributes part. This flag indicates if the result is (is not) properly provided after the sequence of moves (just for checking consistency in the PGN file).
oyrange may be oyrange_true or oyrange_false. This flag is false only for games with dates in the range of years [1998,2007]. The oyrange means out of year range.
bad_len (or bad len) flag indicates, when blen_true (blen_false), if the length of the game is (is not) good.
Finally, after the token ###, you can find the sequence of moves. Each move has a number and a letter W (white) or B (black) indicating the th-move of the white or black player, respectively.

For more detail: http://chess-research-project.readthedocs.io/en/latest/

Acknowledgements

See https://chess-research-project.readthedocs.io/en/latest/ for an excellent summary of this dataset. I have done some analysis of this at http://dataanalysis.world/analyzing-a-chess-game-dataset-with-r.

Memory Kernel in the Expertise of Chess Players, A.L. Schaigorodsky, J.I. Perotti, O.V. Billoni, submitted (2015) arXiv:1504.06611

Memory and long range correlations in chess games, A.L. Schaigorodsky, J.I. Perotti, O.V. Billoni, Phys. A 394, 304-311 (2013) arXiv:1307.0729

Innovation and Nested Preferential Growth in Chess Playing Behavior, J.I. Perotti, H.-H. Jo, A.L. Schaigorodsky, O.V. Billoni, Europhys. Lett. 104, 48005 (2013) arXiv:1309.0336

数据概要
数据格式
text,
数据量
1
文件大小
195.06MB
发布方
wanshun1
| 数据量 1 | 大小 195.06MB
3.5 Million Chess Games
Text Detection
Text Detection
许可协议: CC-BY-SA 4.0

Overview

Context

ChessDB is a free chess database. Chess databases in general provide: (1) A convenient way to store your own games in a way where the right game can be quickly found. They can be sorted by a number of different criteria, and searched for quickly; (2) A very time-efficient method to study, which will give the maximum improvement in the minimum time. The features which help you study vary from database to database. The specific features of ChessDB are given further down this web page; and (3) Access to statistics about both your own games and of GMs, which would be impractical to collect without a computer based chess database.

Fore more detail: http://chessdb.sourceforge.net/

Content

http://chess-research-project.readthedocs.io/en/latest/

In the original dataset there are around 3.5 million games. If you study the cumulative number of games played as a function of time (following plot).
enter image description here
you can see that the oldest game in the database is from the year 1783 (a blindfold by Philidor). After that, several periods can be appreciated, roughly. The first period consisting in old games where the data is very sparse. The second period, starting at the end of 1800, is when chess became a popular game. A third period occurs around 1960 during the cold war, when chess became a serious matter; i.e. it became a highly competitive and profesional sport. Finally, a fourth period starts around 1998 with the masification of the Internet. This last period contains most of the games and is the most consistent one. After filtered out games without a complete date (day,month,year), around 1.4 million games remains.

After the comments and description of the columns, each line corresponds to one and only one game. The first columns describe attributes of the game, such as the date in which it was played, the name of the players, etc. The last columns, from 17 onwards after the token ###, contains the sequence of the game moves. Let us provide a description for the columns of the game attributes: Position of the game in the original PGN file.
Date at which the game was played (the format is year.month.day).
Game result specified inside brackets in the PGN file. The value can be 1, 0 or -1 corresponding to white win, draw or loose, respectively.
ELO of withe player (an integer number).
ELO of black player (an integer number).
Number of moves in the game (for some games it may be zero!)
date_c = date (in year.month.day) is corrupted or missing? the label should be date_true, meaning the date is corrupted, or date_false, meaning the date is NOT corrupted. The same logic applies to the following attributes ending in “_c” (i.e. _corrupted).
resu_c = result (1-0, 1/2-1/2, or 0-1) is corrupted or missing?
welo_c = withe ELO is corrupted or missing?
belo_c = black ELO is corrupted or missing?
edate_c = event date is corrupted or missing? The event where the game was held (if there is one).
setup may be setup_true or setup_false. If it is true then the game initial position is specified. This is used when playing Fischer Random Chess for example.
fen may be fen_true and fen_false. It is related to column 12.
In the original file the result is provided in two places. At the end of each sequence of moves and in the attributes part. This flag indicates if the result is (is not) properly provided after the sequence of moves (just for checking consistency in the PGN file).
oyrange may be oyrange_true or oyrange_false. This flag is false only for games with dates in the range of years [1998,2007]. The oyrange means out of year range.
bad_len (or bad len) flag indicates, when blen_true (blen_false), if the length of the game is (is not) good.
Finally, after the token ###, you can find the sequence of moves. Each move has a number and a letter W (white) or B (black) indicating the th-move of the white or black player, respectively.

For more detail: http://chess-research-project.readthedocs.io/en/latest/

Acknowledgements

See https://chess-research-project.readthedocs.io/en/latest/ for an excellent summary of this dataset. I have done some analysis of this at http://dataanalysis.world/analyzing-a-chess-game-dataset-with-r.

Memory Kernel in the Expertise of Chess Players, A.L. Schaigorodsky, J.I. Perotti, O.V. Billoni, submitted (2015) arXiv:1504.06611

Memory and long range correlations in chess games, A.L. Schaigorodsky, J.I. Perotti, O.V. Billoni, Phys. A 394, 304-311 (2013) arXiv:1307.0729

Innovation and Nested Preferential Growth in Chess Playing Behavior, J.I. Perotti, H.-H. Jo, A.L. Schaigorodsky, O.V. Billoni, Europhys. Lett. 104, 48005 (2013) arXiv:1309.0336

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号