Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Yace 0.99.56 still the strongest amateur engine!? (Crafty 9.)

Author: pavel

Date: 05:22:33 09/06/02

Go up one level in this thread


On September 05, 2002 at 15:47:24, Dann Corbit wrote:

>On September 05, 2002 at 08:26:56, pavel wrote:
>>On September 05, 2002 at 07:35:12, David Rasmussen wrote:
>>>I don't know how reliable this tournament is. Chezzz is consistently better on
>>>ICC against a number of the opponents that are above it in this tournament.
>>>
>>>/David
>>
>>
>>No offense to SSDF, but I consider this tournament to be more reliable than
>>SSDF.
>>If you look at the format of the tournament and the rules, and the way they are
>>being played out.
>>
>>Only handicap is that, not alot of games are played by each programs in each
>>division.
>>But it's still better than differant programs playing "differant numbers of
>>games" in a rating list.
>
>I don't see a problem with that approach, as long as enough games are played.
>When the number of games for some program is small, then the error bars will be
>large.

Yes, but it is a faulty method IMO.

From SSDF:
                                        Rating + - Games Won Av.opp
1 Fritz 7.0 256MB Athlon 1200 MHz        2741 30 -29 574 64% 2636
2 Shredder 6.0 Paderb 256MB Athlon 1200  2727 34 -32 467 65% 2619
3 Chess Tiger 14.0 CB 256MB Athlon 1200  2721 33 -32 487 63% 2627
4 Gambit Tiger 2.0 256MB Athlon 1200     2718 31 -30 523 60% 2645
5 Shredder 6.0 256MB Athlon 1200 MHz     2717 32 -31 505 64% 2618


The games are not played against same opponent (if so, not same number).

For examples, if shredder6 plays another 107 (which is the number of games less
than Fritz7) games against opponent such as, Crafty and older versions of Fritz
and lower rated programs, it probably will not only shorten the gap between the
first and second program but shredder6 will most likely top fritz7 easily.

It is also true for every other program in the list.

Most people don't even look at the error bar, even so, with random number games
with opponent of differant strength, error bar has little credibility.

Playing equal number of games, most likely, will end up being more precise.

And since you are playing 1000 of games anyways, why not play equal numbers of
games for all program against the same opponents?

Just a thought.

cheers,
pavs



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.