Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SAN problem with Crafty's wall.pgn book

Author: Johan Melin

Date: 08:04:52 06/19/98

Go up one level in this thread


On June 19, 1998 at 00:30:13, Robert Hyatt wrote:

>On June 18, 1998 at 19:58:03, Edward Screven wrote:
>
>>On June 18, 1998 at 17:32:53, Robert Hyatt wrote:
>>
>>>On June 18, 1998 at 16:50:19, Edward Screven wrote:
>>>
>>>>my understanding of the crafty book building procedure is that
>>>>you scan a pgn input file, streaming <position,win,loss,draw>
>>>>records through an aggregating sort, and it's the disk sort
>>>>runs that require lots of space.
>>>>
>>>>if this is correct, then a simple way to reduce your temporary
>>>>space requirements by 1/N, at a cost of making N passes over
>>>>the pgn input, is to partition the position keys into N equal
>>>>sized ranges.  make N passes over the input pgn file.  on the
>>>>i-th pass, discard all positions which are not in the i-th range.
>>>>the independently sorted results of each pass can be appended to
>>>>your final output file.
>>>>
>>>>  - edward
>>>
>>>
>>>here's what happens.  I first parse the pgn and output (now) a 9
>>>byte record for each move, 8 byte hash signature, 1 byte with result
>>>of the game (3 bits) and the !/?/etc flags (5 bits).  This is streamed
>>>out to a file.
>>>
>>>I then read this back in a huge chunk at a time, into a memory buffer,
>>>call qsort() to sort (not a disk sort) and then save each of these
>>>chunks in a separate file.  Just as I finish this, I have two copies
>>>of everything (note I now write 9 byte records, I was writing 20 byte
>>>records [linux] or 24 [windows]).
>>>
>>>I now delete the original unsorted input, then do a simple N-input
>>>merge and write the book out with some indices on the front to give me
>>>quick access.
>>
>>got it.  the next step you might consider is eliminating the first
>>intermediate file.  instead of streaming the unordered records to
>>a file, write them directly into the memory sort buffer.  each time
>>the buffer fills, sort it and spill it to disk as a run.  merge and
>>aggregate as before. that should cut the disk space required in half.
>>
>>if you want to try the key range partioning trick, you would apply
>>it at the point you write the records into the memory buffer -- just
>>discard the records that aren't part of the current pass.
>>
>>  - edward
>
>
>it actually doesn't change a thing, because I first have to sort/write,
>which takes O(N) space, then I have to merge, and during the merge, I
>need O(2N) space.
>
>So the space doesn't change, although I have been working today on doing
>the sort at the parsing step as you just suggested, because I wanted to
>dump the extra I/O going on...

In the intermidate files you got several identical positions occuring over and
over .. but the identical ones will only be one entry in the book. And those
that occur less than 3 times doesn't make it to the book at all.

If I understand Edward correctly, he is suggesting that when you parse the PGN
you only output the positions with a key that starts with, say 00001..., put
those into memory, sort, eliminate doubles and those that only occure ones or
twice. Then you append that to the bookfile. And that stuff wount need furter
proceccing.

Then you go on with a next pass taking only those that start with 00010,
appending the output to the book ...

Divide it into suffisiently many passes, and the intermidiate stuff wouldn't
have to be stored on disk at all ...

/Johan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.