Author: Robert Hyatt
Date: 14:32:53 06/18/98
Go up one level in this thread
On June 18, 1998 at 16:50:19, Edward Screven wrote: >my understanding of the crafty book building procedure is that >you scan a pgn input file, streaming <position,win,loss,draw> >records through an aggregating sort, and it's the disk sort >runs that require lots of space. > >if this is correct, then a simple way to reduce your temporary >space requirements by 1/N, at a cost of making N passes over >the pgn input, is to partition the position keys into N equal >sized ranges. make N passes over the input pgn file. on the >i-th pass, discard all positions which are not in the i-th range. >the independently sorted results of each pass can be appended to >your final output file. > > - edward here's what happens. I first parse the pgn and output (now) a 9 byte record for each move, 8 byte hash signature, 1 byte with result of the game (3 bits) and the !/?/etc flags (5 bits). This is streamed out to a file. I then read this back in a huge chunk at a time, into a memory buffer, call qsort() to sort (not a disk sort) and then save each of these chunks in a separate file. Just as I finish this, I have two copies of everything (note I now write 9 byte records, I was writing 20 byte records [linux] or 24 [windows]). I now delete the original unsorted input, then do a simple N-input merge and write the book out with some indices on the front to give me quick access. It now takes 9/24th of the space it used to take in windows, and9/20th of the space it used to take under unix. I got rid of the long long in the structure, so there is no alignment or padding, and simply use memcpy to move things around at a big savings in disk space. note that this is version 15.15, which is not yet out, but is working well on ICC using this new book format.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.