Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test suites - can they reliably predict ELO?

Author: Enrique Irazoqui

Date: 11:59:35 12/12/99

Go up one level in this thread


On December 12, 1999 at 14:27:39, John Warfield wrote:

>On December 12, 1999 at 06:26:50, Enrique Irazoqui wrote:
>
>>On December 11, 1999 at 20:18:45, John Warfield wrote:
>>
>>>On December 11, 1999 at 19:46:50, Bertil Eklund wrote:
>>>
>>>>On December 11, 1999 at 17:52:56, Tom King wrote:
>>>>
>>>>>Which of the well known test suites predicts the strength of chess programs most
>>>>>accurately?
>>>>>
>>>>>I ask this, because I recently made some *slight* mods. to the evaluation
>>>>>function in my program, Francesca. I ran the LCT-2 suite, and the results
>>>>>indicated that it was a wash - the modification gave me about 5 ELO points,
>>>>>apparently.
>>>>>
>>>>>I then ran a series of fast games against another amateur program. I realize
>>>>>it's important to play a large number of games, to reduce the margin of error,
>>>>>so I ran two matches of 65 games. The result was this:
>>>>>
>>>>>MATCH 1
>>>>>"Normal" Francesca scored 37% against the amateur program.
>>>>>
>>>>>MATCH 2
>>>>>"Modified" Francesca scored 45% against the amateur program.
>>>>>
>>>>>Quite a difference! It implies that the modification is worth over 50 ELO. I
>>>>>guess I need to play more games, against a variety of programs to verify whether
>>>>>this improvement is real, or imaginary.
>>>>>
>>>>>Anyhow, beware of reading too much into ELO predictions of test suites..
>>>>>
>>>>>Cheers All,
>>>>>Tom
>>>>
>>>>Hi!
>>>>
>>>>Mr Irazoquis secret test-suite is very impressing! I think it´s about 111
>>>>positions. He can predict a new programs strength better than any other test I
>>>>have seen so far. If his predictions remains as good as his previous results, I
>>>>hope we can stop publishing our list and just play for fun.
>>>>
>>>>Bertil SSDF
>>>
>>>  Why is this Test secret??
>>
>>I don't publish my test because the moment I do it will be cooked and become
>>worthless. This is one of the reasons that make well known test suites
>>inaccurate, aside from the fact that they have few positions, some of these
>>positions are ambiguous or plain wrong and the rating formula doesn't make
>>sense. Results are so erratic and unrealistic that, for example, Fritz comes
>>best at the BS test and worst at the BT. Etc. etc.
>>
>>A couple of months ago we started talking about test suites during one of the
>>Rebel GM games at ICC, and a programmer was straightforward enough to say that a
>>test won't work because he would cook it next day...
>>
>>My test has by now 130 positions not included in any other test and took me 11
>>months so far to put it together, and quite a bit longer to figure it out, so
>>you can imagine that I feel quite reluctant to throw it to the garbage. But it
>>is a bit of a catch 22 situation: If I don't publish it, no one will trust it;
>>if I do, no one should. :(
>>
>>In case you are interested, this is my current result of latest programs:
>>
>>PIII-500    Test    SSDF scale
>>RT            0        2691
>>CM6K        -16        2675
>>N732        -27        2664
>>F6-F6a      -33        2658
>>F532        -33        2658
>>H732        -38        2653
>>J5          -70        2621
>>C171       -104        2587
>>
>>Now I am running it with Shredder 4, Genius 6.5 and Zarkov 5, but it takes 2
>>boooooring days per program and I feel quite lazy at the moment.
>>
>>Enrique
>
>
>  HI Enrique
>
>
>  If you ever decide to release this test, I would like to be one of the first
>to recieve it.

Sure, but don't hold your breath... :)

Enrique




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.