Author: Robert Hyatt
Date: 21:30:13 06/18/98
Go up one level in this thread
On June 18, 1998 at 19:58:03, Edward Screven wrote: >On June 18, 1998 at 17:32:53, Robert Hyatt wrote: > >>On June 18, 1998 at 16:50:19, Edward Screven wrote: >> >>>my understanding of the crafty book building procedure is that >>>you scan a pgn input file, streaming <position,win,loss,draw> >>>records through an aggregating sort, and it's the disk sort >>>runs that require lots of space. >>> >>>if this is correct, then a simple way to reduce your temporary >>>space requirements by 1/N, at a cost of making N passes over >>>the pgn input, is to partition the position keys into N equal >>>sized ranges. make N passes over the input pgn file. on the >>>i-th pass, discard all positions which are not in the i-th range. >>>the independently sorted results of each pass can be appended to >>>your final output file. >>> >>> - edward >> >> >>here's what happens. I first parse the pgn and output (now) a 9 >>byte record for each move, 8 byte hash signature, 1 byte with result >>of the game (3 bits) and the !/?/etc flags (5 bits). This is streamed >>out to a file. >> >>I then read this back in a huge chunk at a time, into a memory buffer, >>call qsort() to sort (not a disk sort) and then save each of these >>chunks in a separate file. Just as I finish this, I have two copies >>of everything (note I now write 9 byte records, I was writing 20 byte >>records [linux] or 24 [windows]). >> >>I now delete the original unsorted input, then do a simple N-input >>merge and write the book out with some indices on the front to give me >>quick access. > >got it. the next step you might consider is eliminating the first >intermediate file. instead of streaming the unordered records to >a file, write them directly into the memory sort buffer. each time >the buffer fills, sort it and spill it to disk as a run. merge and >aggregate as before. that should cut the disk space required in half. > >if you want to try the key range partioning trick, you would apply >it at the point you write the records into the memory buffer -- just >discard the records that aren't part of the current pass. > > - edward it actually doesn't change a thing, because I first have to sort/write, which takes O(N) space, then I have to merge, and during the merge, I need O(2N) space. So the space doesn't change, although I have been working today on doing the sort at the parsing step as you just suggested, because I wanted to dump the extra I/O going on...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.