Author: John Merlino
Date: 17:25:52 11/09/01
Go up one level in this thread
On November 09, 2001 at 15:44:07, Scott Gasch wrote: >On November 09, 2001 at 11:54:42, John Merlino wrote: > >Hi John. I don't have hard numbers available without re-creating the book. >Which I will do tonight when I am home from work... and report more details if >you want them. But here is my recollection... > >>1) In general, how long does it take to create a book (i.e. size to time ratio). > >I create monsoon's book from a PGN file with over 500,000 games in it. (I got >this file from the week in chess (WIC) archives, pitt.edu, and chessclub.com's >GM library, and other Internet sources). I have built the book on two machines >-- a 450Mhz/576Mb FreeBSD machine and a 1.2Ghz/1Gb Win2k machine. It takes >about 10 minutes on the FreeBSD box. It takes about 3 minutes on the NT box. >The final book binary is about 90Mb in size but this is "processed", see below. > >My book building process is to allocate a huge buffer in memory and lock it in, >so that it does not swap. Then to use this buffer as a hash table. The PGN >parser reads a game and "plays" each move _up to move 25_ (ply 50). Each >position signature + next move is used as a hash key to create an entry in this >huge memory buffer (or, if the entry is already there and we've seen this >position+move before, to update that entry's win/draw/loss ratio). I use linear >probing in the event of a collision in the hash. > >When adding a new entry to the buffer I keep track of the buffer's fullness. >If, during the creation process, the buffer ever gets over 90% full I run a >"strainer" on the buffer. This strainer goes through the memory and drops >unpopular entries (starting at "seen this position+move only one time") until >the percent-full drops below 90% again. The 90% is arbitrary but I figured the >linear probing would be getting pretty slow at 90% full. > >On the NT machine with 1Gb of memory, my buffer is large enough that the >intermediate straining never has to happen. On the FreeBSD machine it happens >once, aroudn game 300,000 in the PGN file. > >When the PGN file is exhausted I do a final strain on the book (to eliminate >positions I have seen only once in all 500,000 games). I do this to keep the >book binary on disk a reasonable size. Then I sort the buffer on position >signature with a quicksort. Once its sorted I dump it out to disk. That's my >book. > >I originally was building the book in 128Mb of memory which, despite me trying >to keep the size of the buffer sane, caused swapping and took all night to >complete. This really sucked as it was back when I was still debugging the book >code and I'd go to sleep expecting to wake up to a nice new book only to find an >assertion fired or a crash in the PGN parser code. As I'm sure you know with a >500,000 game PGN file you can't verify it by hand... and there are all sorts of >crazy text constucts in there that cause the PGN reader trouble. (e.g. foriegn >language piece letters, lines to seperate PGN games, tournament standings, email >headers, etc.) I have taken to simply skipping any PGN game with "errors" in it >and writing out "error in game N at line M." as the book it being built. There >are a ton of these, maybe 1% of the 500,000. > >But with memory so cheap nowadays I decided it was worth it to go out and buy a >little more... Also if you are on windows and the process is running with admin >privileges, look up VirtualLock. This helps. There's a FreeBSD equivalent >called vmlock or something like that(?). > >>2) How much memory does the process take (size to RAM ratio). > >Unlike you commercial guys, I can assume that I can use the whole memory. So I >do... my buffer for creating the book is enormous so as to minimize the >intermediate calls to the "strain" routine (which causes loss of information in >the final book in order to speed up book generation). So ratio of buffer size >to total RAM is approx 9:10. > >Hope this helps, >Scott To give a quick answer: Your RAM usage appears to be about the same as mine, but I find it very hard to believe that you can fully process a half-million game file and create a 90MB book file in 3 minutes. How is this possible? Just COPYING a 90MB file takes about 30 seconds on my PIII-600. A half-million game PGN file is approximately 350MB, right? But you're right about general memory availability. We really can't COUNT ON more than 30-50MB free at any one time (although a more typical machine today would probably have well over 100MB free). So, this limits us quite a bit. My test was with a book with just under 150,000 games in it. It took about 250MB of RAM (which ended up requiring about 100MB of swapping on my machine), and a little less than 4 hours to process at a depth of 40 plies. The result (after pruning zero-weight branches, which is, in effect, the same as your "straining" process) was a file that was about 550K. If I had not pruned the zero-weight branches, the file would have been about 6.1MB. Admittedly, though, this timing is during a debugging process, and I have not actually tried running it with a release build. However, I think our PGN reader code is one of the main bottlenecks. It appears to only be able to import about 100 games/second, and nobody else has reported anything less than 2,000 games/second. That's probably something I should look at. Thanks for your input, jm
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.