Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Compiling Crafty's enormous PGN into an opening book

Author: Tony Werten

Date: 21:43:00 09/25/03

Go up one level in this thread


On September 24, 2003 at 20:27:02, Steven Edwards wrote:

>On September 24, 2003 at 19:54:35, Robert Hyatt wrote:
>>On September 24, 2003 at 18:29:05, Steven Edwards wrote:
>
>>>I had my toolkit compile Crafty's 945,950,820 byte enormous.pgn game file into
>>>an opening book as a stress test.  The 1,511,728 games required about 14 hours
>>>to process and this resulted in an opening book position library with 34,961,266
>>>entries.  All positions reached in the first 48 ply for each game were stored.
>
>>I am not sure, but the last time I did this it took about 1 hour on an
>>older 500mhz xeon, to read/parse/sort/create that big a book file.  Of
>>course, I don't know what kind of disks you are using, mine are always
>>SCSI of some flavor...
>
>The total I/O time is only a few minutes, so the IDE/SCSI factor is unimportant
>in this case.
>
>Most of the time is spent in the PGN movetext parser.  Unlike a minimal PGN
>scanner, the toolkit's movetext parser is performing every kind of check and
>normalization possible; this includes a rather vigourous near-SAN to SAN mapper
>along with full processing of recursive variations, NAGs, and some features that
>may go into the PGN2005 specification.  It even detects the bogus 0x1a byte near
>the end of the enormous.pgn file.
>
>The resulting opening book requires O(log2(sqrt(N))) probes to fetch a entry.
>This is fast enough to use the book at the upper levels of the search and other
>selected interior nodes.

I think you can lower this quite a lot.

First, group the positions together by parent node ( As Bob once explained )
then do a binairy search on the nodes.

If you add moves, just add them under the parent node. Your probing code should
notice that the last node for the binairy search points to extra nodes, so after
the binairy search you still have to do a sequential search until the
nextnodepointer is nil. If this sequential part becomes to big, you'd have to
order them.

OTOH, this can make zero difference with these booksizes if the nodes are
randomly squatered on disk. It probably is better to do 200 lookups with data
that is sequential on disk than 10 random lookups.

In XiniX I can probe the first 4 ply without a problem, 5 takes less than 1 sec
so if searchtime is big enough, no problem. 6 Ply became to costly last time I
tested.

Tony




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.