Author: Robert Hyatt
Date: 14:09:29 06/19/98
Go up one level in this thread
On June 19, 1998 at 11:04:52, Johan Melin wrote: >On June 19, 1998 at 00:30:13, Robert Hyatt wrote: > >>On June 18, 1998 at 19:58:03, Edward Screven wrote: >> >>>On June 18, 1998 at 17:32:53, Robert Hyatt wrote: >>> >>>>On June 18, 1998 at 16:50:19, Edward Screven wrote: >>>> >>>>>my understanding of the crafty book building procedure is that >>>>>you scan a pgn input file, streaming <position,win,loss,draw> >>>>>records through an aggregating sort, and it's the disk sort >>>>>runs that require lots of space. >>>>> >>>>>if this is correct, then a simple way to reduce your temporary >>>>>space requirements by 1/N, at a cost of making N passes over >>>>>the pgn input, is to partition the position keys into N equal >>>>>sized ranges. make N passes over the input pgn file. on the >>>>>i-th pass, discard all positions which are not in the i-th range. >>>>>the independently sorted results of each pass can be appended to >>>>>your final output file. >>>>> >>>>> - edward >>>> >>>> >>>>here's what happens. I first parse the pgn and output (now) a 9 >>>>byte record for each move, 8 byte hash signature, 1 byte with result >>>>of the game (3 bits) and the !/?/etc flags (5 bits). This is streamed >>>>out to a file. >>>> >>>>I then read this back in a huge chunk at a time, into a memory buffer, >>>>call qsort() to sort (not a disk sort) and then save each of these >>>>chunks in a separate file. Just as I finish this, I have two copies >>>>of everything (note I now write 9 byte records, I was writing 20 byte >>>>records [linux] or 24 [windows]). >>>> >>>>I now delete the original unsorted input, then do a simple N-input >>>>merge and write the book out with some indices on the front to give me >>>>quick access. >>> >>>got it. the next step you might consider is eliminating the first >>>intermediate file. instead of streaming the unordered records to >>>a file, write them directly into the memory sort buffer. each time >>>the buffer fills, sort it and spill it to disk as a run. merge and >>>aggregate as before. that should cut the disk space required in half. >>> >>>if you want to try the key range partioning trick, you would apply >>>it at the point you write the records into the memory buffer -- just >>>discard the records that aren't part of the current pass. >>> >>> - edward >> >> >>it actually doesn't change a thing, because I first have to sort/write, >>which takes O(N) space, then I have to merge, and during the merge, I >>need O(2N) space. >> >>So the space doesn't change, although I have been working today on doing >>the sort at the parsing step as you just suggested, because I wanted to >>dump the extra I/O going on... > >In the intermidate files you got several identical positions occuring over and >over .. but the identical ones will only be one entry in the book. And those >that occur less than 3 times doesn't make it to the book at all. > >If I understand Edward correctly, he is suggesting that when you parse the PGN >you only output the positions with a key that starts with, say 00001..., put >those into memory, sort, eliminate doubles and those that only occure ones or >twice. Then you append that to the bookfile. And that stuff wount need furter >proceccing. > >Then you go on with a next pass taking only those that start with 00010, >appending the output to the book ... > >Divide it into suffisiently many passes, and the intermidiate stuff wouldn't >have to be stored on disk at all ... > >/Johan This turns an O(N) operation (cpu complexity) and O(2N) operation (Space) into an O(N^2) (complexity) to get to O(N) space. I don't like the trade off. I can build a 350,000 game book with 25,000,000 moves, in under 40 minutes on my P6, and under 10 minutes on a good alpha. The factor of two in space isn't worth the factor of N in speed, because N is *big*.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.