Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SAN problem with Crafty's wall.pgn book

Author: Edward Screven

Date: 16:58:03 06/18/98

Go up one level in this thread


On June 18, 1998 at 17:32:53, Robert Hyatt wrote:

>On June 18, 1998 at 16:50:19, Edward Screven wrote:
>
>>my understanding of the crafty book building procedure is that
>>you scan a pgn input file, streaming <position,win,loss,draw>
>>records through an aggregating sort, and it's the disk sort
>>runs that require lots of space.
>>
>>if this is correct, then a simple way to reduce your temporary
>>space requirements by 1/N, at a cost of making N passes over
>>the pgn input, is to partition the position keys into N equal
>>sized ranges.  make N passes over the input pgn file.  on the
>>i-th pass, discard all positions which are not in the i-th range.
>>the independently sorted results of each pass can be appended to
>>your final output file.
>>
>>  - edward
>
>
>here's what happens.  I first parse the pgn and output (now) a 9
>byte record for each move, 8 byte hash signature, 1 byte with result
>of the game (3 bits) and the !/?/etc flags (5 bits).  This is streamed
>out to a file.
>
>I then read this back in a huge chunk at a time, into a memory buffer,
>call qsort() to sort (not a disk sort) and then save each of these
>chunks in a separate file.  Just as I finish this, I have two copies
>of everything (note I now write 9 byte records, I was writing 20 byte
>records [linux] or 24 [windows]).
>
>I now delete the original unsorted input, then do a simple N-input
>merge and write the book out with some indices on the front to give me
>quick access.

got it.  the next step you might consider is eliminating the first
intermediate file.  instead of streaming the unordered records to
a file, write them directly into the memory sort buffer.  each time
the buffer fills, sort it and spill it to disk as a run.  merge and
aggregate as before. that should cut the disk space required in half.

if you want to try the key range partioning trick, you would apply
it at the point you write the records into the memory buffer -- just
discard the records that aren't part of the current pass.

  - edward




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.