Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Yace 0.99.56 still the strongest amateur engine!? (Crafty 9.)

Author: Peter Fendrich

Date: 09:09:55 09/06/02

Go up one level in this thread


On September 06, 2002 at 08:22:33, pavel wrote:

>On September 05, 2002 at 15:47:24, Dann Corbit wrote:
>
>>On September 05, 2002 at 08:26:56, pavel wrote:
>>>On September 05, 2002 at 07:35:12, David Rasmussen wrote:
>>>>I don't know how reliable this tournament is. Chezzz is consistently better on
>>>>ICC against a number of the opponents that are above it in this tournament.
>>>>
>>>>/David
>>>
>>>
>>>No offense to SSDF, but I consider this tournament to be more reliable than
>>>SSDF.
>>>If you look at the format of the tournament and the rules, and the way they are
>>>being played out.
>>>
>>>Only handicap is that, not alot of games are played by each programs in each
>>>division.
>>>But it's still better than differant programs playing "differant numbers of
>>>games" in a rating list.
>>
>>I don't see a problem with that approach, as long as enough games are played.
>>When the number of games for some program is small, then the error bars will be
>>large.
>
>Yes, but it is a faulty method IMO.
>
>From SSDF:
>                                        Rating + - Games Won Av.opp
>1 Fritz 7.0 256MB Athlon 1200 MHz        2741 30 -29 574 64% 2636
>2 Shredder 6.0 Paderb 256MB Athlon 1200  2727 34 -32 467 65% 2619
>3 Chess Tiger 14.0 CB 256MB Athlon 1200  2721 33 -32 487 63% 2627
>4 Gambit Tiger 2.0 256MB Athlon 1200     2718 31 -30 523 60% 2645
>5 Shredder 6.0 256MB Athlon 1200 MHz     2717 32 -31 505 64% 2618
>
>
>The games are not played against same opponent (if so, not same number).
>
>For examples, if shredder6 plays another 107 (which is the number of games less
>than Fritz7) games against opponent such as, Crafty and older versions of Fritz
>and lower rated programs, it probably will not only shorten the gap between the
>first and second program but shredder6 will most likely top fritz7 easily.

I don't get that. The programs are lower rated and Shredder6 have to get enough
good results (better than the ratings are saying) in order to shorten the gap.
Do you know that this is the case?

>
>It is also true for every other program in the list.
>
>Most people don't even look at the error bar, even so, with random number games
>with opponent of differant strength, error bar has little credibility.
>
>Playing equal number of games, most likely, will end up being more precise.
>
>And since you are playing 1000 of games anyways, why not play equal numbers of
>games for all program against the same opponents?
>Just a thought

That is is a possible set up but most important is to play a wide range of
opponents and enough games in total.
The effects that you are afraid of can bias the rating if there are only few
opponents but with enough number of opponents those effects are vanishing. But
of course the more unbalanced the number of games with different opponents are,
the higher is the risk of biased ratings. Do some tests with lets say a
population of 50 where each player meets 10 opponents with 20-50 games each.
Take some result lets say 15-5 and make that 37.5-12.5. The rating list will not
change a bit. Maybe some point here and there.

One of the main ideas behind the ELO system is just the possibility to meet
different players and yet have comparable ratings. Otherwise the human ELO list
would not usable at all, given the number of games played and that they have not
met the same opponents.

Peter

.
>
>cheers,
>pavs



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.