Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test Suites

Author: Steven Edwards

Date: 18:30:42 12/08/03

Go up one level in this thread


On December 08, 2003 at 20:38:03, Russell Reagan wrote:
>On December 08, 2003 at 20:27:33, Steven Edwards wrote:
>
>I like the NATS idea.
>
>>One plan is to categorize position difficulty based on
>>
>>    d = log N     d: difficulty    N: node count to solution
>
>Would the method for counting nodes be standardized? Some count nodes
>differently than others. For instance, one might not count the nodes visited
>during a null-move search and instead count the root node from which that
>null-move search was performed as a single node. Some programs do a great deal
>more work per node. Maybe more than one metric should be used. I think time is a
>better one than nodes. Nodes seem dependent upon the design philosophy of the
>engine writer.

While the concept of a node varies from program to program, it is always the
same (one hopes) within any given program.  And that's all we need to perform a
normalization.  Example: consider four programs each of which run the same test
suite under tournamament conditions (hardware and time).  The largest common
subset of solved problems is selected from the result and the total N (sum of
the solved position node counts) from each program for this subset is then
computed.  This give four separate and very likely different sums.  Now, for
each program, we calculate a separate normalization factor given by the
reciprical of the N value sum and then multiply the individual node counts for
the problem subset results for that program.  This gives comparible numbers
across the programs, so they can then be averaged.  These means, one for each
position in the subset, can then be treated to the log N difficulty metric.

There are other ways of doing this, but they'll all give about the same
rankings.  Sligntly modified procedures are used for positions solved by only a
subset of the programs, including position solved by no programs at tournament
effort levels.

>The times would go down as hardware gets faster, but that is part of the point
>of test suites, to determine how computer chess has advanced, and hardware is a
>part of that equation.

And the process can also be used to normalize results by the same program
running on different contemporary platforms.

----

I hope to have the starting velsion (the NATS/2003) finished by the end of the
month.  It can act as a seed for test suite development during 2004.  Of course,
we'll need the NACCA membership to help.

----

Back when the first EPD test suites were published, I thought that the ICCA
should have come up with its own test suite, formally tested by its members and
with versions periodically emailed and posted (and archived).  It would have
been a big help.  I would have done it myself but was busy with other chess
topics.

So the NATS, like several other goals of the NACCA, is intended to assist the
active CC researcher in a practical manner in ways beneficial to all from
neophyte to old-hand.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.