Author: Scott Gasch
Date: 12:44:07 11/09/01
Go up one level in this thread
On November 09, 2001 at 11:54:42, John Merlino wrote: Hi John. I don't have hard numbers available without re-creating the book. Which I will do tonight when I am home from work... and report more details if you want them. But here is my recollection... >1) In general, how long does it take to create a book (i.e. size to time ratio). I create monsoon's book from a PGN file with over 500,000 games in it. (I got this file from the week in chess (WIC) archives, pitt.edu, and chessclub.com's GM library, and other Internet sources). I have built the book on two machines -- a 450Mhz/576Mb FreeBSD machine and a 1.2Ghz/1Gb Win2k machine. It takes about 10 minutes on the FreeBSD box. It takes about 3 minutes on the NT box. The final book binary is about 90Mb in size but this is "processed", see below. My book building process is to allocate a huge buffer in memory and lock it in, so that it does not swap. Then to use this buffer as a hash table. The PGN parser reads a game and "plays" each move _up to move 25_ (ply 50). Each position signature + next move is used as a hash key to create an entry in this huge memory buffer (or, if the entry is already there and we've seen this position+move before, to update that entry's win/draw/loss ratio). I use linear probing in the event of a collision in the hash. When adding a new entry to the buffer I keep track of the buffer's fullness. If, during the creation process, the buffer ever gets over 90% full I run a "strainer" on the buffer. This strainer goes through the memory and drops unpopular entries (starting at "seen this position+move only one time") until the percent-full drops below 90% again. The 90% is arbitrary but I figured the linear probing would be getting pretty slow at 90% full. On the NT machine with 1Gb of memory, my buffer is large enough that the intermediate straining never has to happen. On the FreeBSD machine it happens once, aroudn game 300,000 in the PGN file. When the PGN file is exhausted I do a final strain on the book (to eliminate positions I have seen only once in all 500,000 games). I do this to keep the book binary on disk a reasonable size. Then I sort the buffer on position signature with a quicksort. Once its sorted I dump it out to disk. That's my book. I originally was building the book in 128Mb of memory which, despite me trying to keep the size of the buffer sane, caused swapping and took all night to complete. This really sucked as it was back when I was still debugging the book code and I'd go to sleep expecting to wake up to a nice new book only to find an assertion fired or a crash in the PGN parser code. As I'm sure you know with a 500,000 game PGN file you can't verify it by hand... and there are all sorts of crazy text constucts in there that cause the PGN reader trouble. (e.g. foriegn language piece letters, lines to seperate PGN games, tournament standings, email headers, etc.) I have taken to simply skipping any PGN game with "errors" in it and writing out "error in game N at line M." as the book it being built. There are a ton of these, maybe 1% of the 500,000. But with memory so cheap nowadays I decided it was worth it to go out and buy a little more... Also if you are on windows and the process is running with admin privileges, look up VirtualLock. This helps. There's a FreeBSD equivalent called vmlock or something like that(?). >2) How much memory does the process take (size to RAM ratio). Unlike you commercial guys, I can assume that I can use the whole memory. So I do... my buffer for creating the book is enormous so as to minimize the intermediate calls to the "strain" routine (which causes loss of information in the final book in order to speed up book generation). So ratio of buffer size to total RAM is approx 9:10. Hope this helps, Scott
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.