Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test suites - can they reliably predict ELO?

Author: Tom King

Date: 13:33:18 12/12/99

Go up one level in this thread


On December 11, 1999 at 18:15:46, Bruce Moreland wrote:

>On December 11, 1999 at 17:52:56, Tom King wrote:
>
>>Which of the well known test suites predicts the strength of chess programs most
>>accurately?
>>
>>I ask this, because I recently made some *slight* mods. to the evaluation
>>function in my program, Francesca. I ran the LCT-2 suite, and the results
>>indicated that it was a wash - the modification gave me about 5 ELO points,
>>apparently.
>>
>>I then ran a series of fast games against another amateur program. I realize
>>it's important to play a large number of games, to reduce the margin of error,
>>so I ran two matches of 65 games. The result was this:
>>
>>MATCH 1
>>"Normal" Francesca scored 37% against the amateur program.
>>
>>MATCH 2
>>"Modified" Francesca scored 45% against the amateur program.
>>
>>Quite a difference! It implies that the modification is worth over 50 ELO. I
>>guess I need to play more games, against a variety of programs to verify whether
>>this improvement is real, or imaginary.
>>
>>Anyhow, beware of reading too much into ELO predictions of test suites..
>
>I don't think that even good suites such as LCT2 predict anything, especially
>when you talk about 5 Elo point differentials.  You can have differentials much
>larger than that just because of random search differences.
>
>I don't know how to do the statistics but somehow I think that running two
>65-game matches twice and trying to make sense out of an 8% result difference
>won't work.
>
>bruce

ok. more games against a variety of opponents at differing time levels, then.
Right?

Rgds,
Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.