Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Little Test Suite Survey ?

Author: David Dahlem

Date: 13:34:01 08/30/04

Go up one level in this thread


On August 30, 2004 at 16:09:27, GeoffW wrote:

>Hi
>
>I have been working on some mods to my program recently, as usual I was
>struggling a little trying to figure out the effects. I like to run a couple of
>test suites to roughly prove whether the effect is positive or negative.
>
>It got me thinking what other test suites, and test conditions people use to
>test their mods. My criteria are
>
>1) The test suite should run in less than 15 minutes. Not got enough patience I
>know :-)
>
>2) The tests should contain as many borderline Solves/Fails as possible for the
>time used, to measure the effect of any changes.
>
>Obviously the chosen test suite and parameters will vary with, the strength of
>the engine being tested, and the speed of the PC
>
>Here is what I have found is best for me
>
>PC 2.2 GHz P4 using 64 Meg hash
>
>LCT2.EPD 10 secs/position  = 19 solved  16 failed
>small changes to my engine can make this easily go to 15 solved 20 Fails
>
>
>IQ4.EPD 3 secs/position  = 108 solved  80 failed
>just fixed a 1 line hash bug that made this previouly  94 solved 94 Fails
>
>
>Data from other people would be interesting to compare ?
>
>
>
>         Regards Geoff

The IQ Test has been updated to IQ5. Download here ...

http://www.horizonchess.com/Jim/iq5.html

Tactical IQ5 Test for Chess Programs
This test is based on a test suite from the master section of Livshits' book
"Test Your Chess IQ". It has been debugged and improved. The positions are
carefully balanced with medium to hard examples. The test as presented here is
intended to estimate the tactical strength of chess engines. The scoring is
approximate and not intended to be too serious.

Changes: IQ to IQ2:
I dropped 131 "easy" positions solvable in 4 ply or less
I dropped 16 "hard" positions requiring 12 ply or more
I dropped 12 "avoid move" positions due to their negative focus Resulted in 360
positions down to 201.
IQ2 to IQ3:
Andreas Hermann pointed out 6 duplicate positions. Resulted in 201 down to 195.
IQ3 to IQ4:
Jon Dart pointed identified 7 problematical positions. Resulted in 195 down to
188.
IQ5
Puzzles increased from 188 to 191. Three 12+ ply puzzles added back in. Thanks
to Dann Corbit, Kurt Utzinger, and Uri Blass for analysis. Scoring method
changed to reflect current puzzle book trends. Time is not emphasized now, but
rather the difficulty of the puzzle is rated and scored accordingly.
The test as presented here is intended to estimate the tactical strength of
chess engines. The test duration is 31 minutes and 20 seconds. Run the test on a
chess engine at 10 seconds per position, note the percentage score achieved and
calculate the engine's IQ4 score:


IQ5 scoring

Ply Depth    # of puzzles    Score Value     Max Possible
    5             47              2.5             117.5
    6             45              3.0             135.0
    7             39              3.5             136.5
    8             19              4.0              76.0
    9             17              4.5              76.5
   10             15              5.0              75.0
   11              6              5.5              33.0
   12+             3              6.0              18.0
                                                  _____
                                  Total           667.5


Elo range for the test is 1995 to 2663.

There is now no time limit for the test as opposed to the old standard of 10
seconds per position which seemed rather artifical. Simply test an engine and
include the PC processor details (speed, hash, etc.) and time taken. You might
want to split the test up by ply and run them seperately to score the test
easier.

The new formula is:

IQ5 elo = 1995 + 2.5(5s) + 3.0(6s) + 3.5(7s) + 4.0(8s) + 4.5(9s) + 5.0(10s) +
5.5(11s) + 6.0(12s)

Where (5s) for instance means the number of ply 5 positions scored correctly and
so on. Note point value of a puzzle is half the ply depth.

Information and Improvements
If you run the test and would like to share your results please send them along.
If you find any alternative solutions or questionable ones please send this data
so the test can be improved. My thanks to the posters at CCC and the WB forum
for pointing out errors and omissions. Enjoy the test.

Regards
Dave



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.