Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test suites - can they reliably predict ELO?

Author: Enrique Irazoqui

Date: 03:22:29 12/12/99

Go up one level in this thread


On December 11, 1999 at 19:46:50, Bertil Eklund wrote:

>On December 11, 1999 at 17:52:56, Tom King wrote:
>
>>Which of the well known test suites predicts the strength of chess programs most
>>accurately?
>>
>>I ask this, because I recently made some *slight* mods. to the evaluation
>>function in my program, Francesca. I ran the LCT-2 suite, and the results
>>indicated that it was a wash - the modification gave me about 5 ELO points,
>>apparently.
>>
>>I then ran a series of fast games against another amateur program. I realize
>>it's important to play a large number of games, to reduce the margin of error,
>>so I ran two matches of 65 games. The result was this:
>>
>>MATCH 1
>>"Normal" Francesca scored 37% against the amateur program.
>>
>>MATCH 2
>>"Modified" Francesca scored 45% against the amateur program.
>>
>>Quite a difference! It implies that the modification is worth over 50 ELO. I
>>guess I need to play more games, against a variety of programs to verify whether
>>this improvement is real, or imaginary.
>>
>>Anyhow, beware of reading too much into ELO predictions of test suites..
>>
>>Cheers All,
>>Tom
>
>Hi!
>
>Mr Irazoquis secret test-suite is very impressing! I think it´s about 111
>positions. He can predict a new programs strength better than any other test I
>have seen so far. If his predictions remains as good as his previous results, I
>hope we can stop publishing our list and just play for fun.
>
>Bertil SSDF

You are kidding, of course. A test can try to predict performance, but never
supplant real life. Besides, the beauty and the interest is in the games, and
that's why I become like an owl staring at the screen when I play my own
tournaments, almost fulfilling my wish of being the SSDF, but no way... :(

Yesterday I was digging in your lists over the years trying to figure out which
programs made real advances over the last few years and which ones didn't (say,
compare Mchess 5 and Mchess 8, Hiarcs 4 and H732, F3 and F532, Rebel6 and Rebel
9, etc.), and also comparing the ratings of programs the first and last time you
list them, to get an idea of the influence of learners and tuning: 18 points
average, not as much as I expected. Interesting results, to me at least.

Bertil, your Swedish list is invaluable for many reasons.

Enrique




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.