Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SAN problem with Crafty's wall.pgn book

Author: Robert Hyatt

Date: 21:30:13 06/18/98

Go up one level in this thread


On June 18, 1998 at 19:58:03, Edward Screven wrote:

>On June 18, 1998 at 17:32:53, Robert Hyatt wrote:
>
>>On June 18, 1998 at 16:50:19, Edward Screven wrote:
>>
>>>my understanding of the crafty book building procedure is that
>>>you scan a pgn input file, streaming <position,win,loss,draw>
>>>records through an aggregating sort, and it's the disk sort
>>>runs that require lots of space.
>>>
>>>if this is correct, then a simple way to reduce your temporary
>>>space requirements by 1/N, at a cost of making N passes over
>>>the pgn input, is to partition the position keys into N equal
>>>sized ranges.  make N passes over the input pgn file.  on the
>>>i-th pass, discard all positions which are not in the i-th range.
>>>the independently sorted results of each pass can be appended to
>>>your final output file.
>>>
>>>  - edward
>>
>>
>>here's what happens.  I first parse the pgn and output (now) a 9
>>byte record for each move, 8 byte hash signature, 1 byte with result
>>of the game (3 bits) and the !/?/etc flags (5 bits).  This is streamed
>>out to a file.
>>
>>I then read this back in a huge chunk at a time, into a memory buffer,
>>call qsort() to sort (not a disk sort) and then save each of these
>>chunks in a separate file.  Just as I finish this, I have two copies
>>of everything (note I now write 9 byte records, I was writing 20 byte
>>records [linux] or 24 [windows]).
>>
>>I now delete the original unsorted input, then do a simple N-input
>>merge and write the book out with some indices on the front to give me
>>quick access.
>
>got it.  the next step you might consider is eliminating the first
>intermediate file.  instead of streaming the unordered records to
>a file, write them directly into the memory sort buffer.  each time
>the buffer fills, sort it and spill it to disk as a run.  merge and
>aggregate as before. that should cut the disk space required in half.
>
>if you want to try the key range partioning trick, you would apply
>it at the point you write the records into the memory buffer -- just
>discard the records that aren't part of the current pass.
>
>  - edward


it actually doesn't change a thing, because I first have to sort/write,
which takes O(N) space, then I have to merge, and during the merge, I
need O(2N) space.

So the space doesn't change, although I have been working today on doing
the sort at the parsing step as you just suggested, because I wanted to
dump the extra I/O going on...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.