Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty WAC Results

Author: Bruce Moreland

Date: 22:01:58 09/12/01

Go up one level in this thread


On September 12, 2001 at 19:28:15, Dann Corbit wrote:

>On September 12, 2001 at 19:13:13, Bruce Moreland wrote:

>>WAC is a dumb test when you have a program at this level.  You can throw out all
>>but about 10 of them, and you still have something that's hard to use to test a
>>program at a sensible time control, because the remainder are either too easy
>>(still) or too hard.
>
>I think you're wrong.
>
>In fact, I think that this test may have uncovered some problem with crafty.
>I used to get a better score on a slower machine.  I suspect some minor thing
>may have got broken.  Either than, or an evaluation term may have been changed
>that plays better in real games but plays worse on test suites.
>
>I don't think you can throw out any of them, unless it gets them right.  How
>will you know it got them right unless you run it.
>
>As a side point, sometimes chess engines discover new solutions to the problems
>when run at longer time controls.  That is an interesting finding, to me at
>least.
>
>What are you trying to accomplish when you run a test suite?

There are over 60 positions in this suite that are mates that Gerbil can resolve
in under 3 seconds on a 1.2 ghz machine.

Gerbil finds 269 correct answers in less than 3.00 seconds.  It finds 260 of
these in less than 1.00 seconds.

Okay, so this is a generic program with null-move R=2, and no extensions other
than check, no pruning of any sort other than null-move, and it solves 87% of
the problems in under a second.

If all you do is count solutions, a lot of time is wasted, since many of the
problems you solve if you have something that generates most of the legal moves
and doesn't break.

If you take more time, or use a better program than Gerbil, the question is
whether you solve 295, 296, or 297.  That seems like it would leave you limited
possibilites for differentiating between versions.

The few that aren't solved become the focus of scrutiny because people want to
get all 300, like they are collecting a set of something and want to complete
it.

The easy problems can be culled once you get to a certain point.  It's certainly
not worth testing the whole set daily.  Once you cull the easy positions, some
of the remaining ones are interesting, but what are we talking about really, a
dozen of them?

I use test suites for a lot of things.  One thing I use them for is measuring
tactical improvement.  You can't do this with WAC, since you are going to get
295, 296, or 297 no matter what you do.  If you double or halve the speed, you
get the same number correct.

If you want to measure tactical progress, there are better sets.  There are lots
of positions that will take between fifteen seconds and five minutes to solve on
decent hardware.  There are very few of these in WAC.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.