Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Bratko-Kopec Test - Node Counts

Author: Bruce Moreland
Date: 13:54:20 06/17/98

On June 17, 1998 at 14:14:08, Peter McKenzie wrote:

>Greetings,
>
>In a recent post Bob Hyatt posted the following results for the Bratko Kopec
>test, using Crafty to search all positions to a fixed depth of 9 plies:
>
>Positions Searched......          24
>Number right............          19
>Total Nodes Searched....  34,195,500
>
>I tried this on my program lambChop, and got the following results:
>
>Positions Searched......          24
>Number right............          18
>Total Nodes Searched.... 232,925,032
>
>So my program searched almost 7 times as many nodes!
>
>What sort of numbers do other programs give?
>Is Crafty's node count typically pretty low compared to others?
>
>Maybe mine has some bugs, or maybe I'm doing too many extensions or maybe
>my q-search is too big, or ....
>
>My search uses R=2 null move pruning, or at least its supposed to :-)
>Transposition table size was 0.5 million entries.
>
>I'm not sure what the standard way of counting nodes is, this is how I
>do it: I increment my node count in my MakeMove routine, so my node count
>includes the q-search.  I don't count nullmoves as I have a separate
>MakeNullMove routine.

Crafty does a lot of pruning in the quiescent search, so you might be doing
pretty well already, since you probably don't do this.

Here is what I would do if I were you, just to try to understand things.

Unless it can already to this, I would modify your program so that it can do
test suites from the command line.

Then, I would take the version version, number it 000, save the source, make an
exe, call it "chop000.exe", and save it.

Then, I would go through the program and remove every extension and pruning
mechanism by commenting them out.  I'd call this 001 and do the same thing I did
before.

Then I would put the null move back in and call it 002.

Then I'd put the check extension back in and call it 003.

Then I'd put the recapture extension back in and call it 004.

Then I'd do something else and call it 005, and so on, until I got too tired to
work.

At this point you figure out when you're going to work on the program next,
which in my case would be the time the kids go to bed the next day, and divide
this by number of versions I did multiplied by 300, which is the number of tests
in the WAC suite.

I'd then make a batch macro that ran all of these versions on the WAC suite for
that many seconds.

If I repeated this for several days running, I would try to figure out my
average version output and the amount of time between programming sessions, and
try to standardize the amount of time spent on each of these tests, so you don't
get a batch for 3 seconds each and another one for two minutes.

OK.  At this point I would start the thing running, go to bed, and hope that the
thing doesn't crash.  It might be worth running bratko-kopec at one second per
position, just to make sure the third version doesn't have a divide by zero at
the top of the search function or something.

Anyway, your batch macro took the output from these test runs, and it numbered
all of the files, just the way your version are numbered.  So you have a
wac.000, a wac.001, a wac.002, etc.

Now you need to make another program that eats these result files and spits out
a table.  Mine takes a "root" name, which in this case would be "wac", a start
value, an end value, and a number of seconds per move.  So if I'd done versions
022 through 028 at 10 seconds per, I'd say "collect wac 22 28 10".

This program will make a nice table that shows me how many seconds it took to
solve each position, and "---" if it couldn't solve one.

I'd make another program that works the same way except that the table shows how
many positions were solved in N seconds or under, so you'd have one column per
version, and ten rows if you'd run for ten seconds.

There are more programs you can make, but these are a start.  I have another one
that tries to figure out how long it took to get to depth D, but I use this for
comparing two versions, and not N versions.

Now you run these programs and see what happened.  If the data confuses me, I
load it into Excel and chart it.

With luck, version 000 will obviously suck, and one of the other versions will
do a lot better, and you can pinpoint the version that started sucking again, so
you can figure out where you blew it.

If all of the versions suck, you can go back to very basic stuff such as move
ordering, hash table bugs, search bugs, etc.

If the null move is working properly, you should get a lot more right with
version 002 than with version 001.  If your check extension is working properly,
you might get a few more with 003 than with 002.  And if your recapture
extension is working properly, maybe you get a few more with 004 than with 003.
At some point you might find that one of your versions is really bad at solving
problems, so you can pinpoint the version with out of control extensions.

If you have a tool that tries to figure out how long it took to get to depth D
for all of these positions, the null move should get you there a lot faster,
probably several times faster.

Test suites aren't necessarily great at determining strength, but if you get 250
right with one version and 275 with the second version, the second version is
probably better, or at very least the second version probably has something good
in it that the first version doesn't have, so if it turns out that the first
version really does play better, you can figure out what is going on and make
something that plays well *and* scores better on WAC or whatever test suite you
feel like using.

I think this is a good use of test suites, you don't let them make your
decisions for you, but you can use them to get a clue about probable tactical
zip.

bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.