Author: Peter Fendrich
Date: 15:47:40 09/06/02
Go up one level in this thread
On September 06, 2002 at 12:29:37, pavel wrote: >On September 06, 2002 at 12:09:55, Peter Fendrich wrote: > >>On September 06, 2002 at 08:22:33, pavel wrote: >> >>>On September 05, 2002 at 15:47:24, Dann Corbit wrote: >>> >>>>On September 05, 2002 at 08:26:56, pavel wrote: >>>>>On September 05, 2002 at 07:35:12, David Rasmussen wrote: >>>>>>I don't know how reliable this tournament is. Chezzz is consistently better on >>>>>>ICC against a number of the opponents that are above it in this tournament. >>>>>> >>>>>>/David >>>>> >>>>> >>>>>No offense to SSDF, but I consider this tournament to be more reliable than >>>>>SSDF. >>>>>If you look at the format of the tournament and the rules, and the way they are >>>>>being played out. >>>>> >>>>>Only handicap is that, not alot of games are played by each programs in each >>>>>division. >>>>>But it's still better than differant programs playing "differant numbers of >>>>>games" in a rating list. >>>> >>>>I don't see a problem with that approach, as long as enough games are played. >>>>When the number of games for some program is small, then the error bars will be >>>>large. >>> >>>Yes, but it is a faulty method IMO. >>> >>>From SSDF: >>> Rating + - Games Won Av.opp >>>1 Fritz 7.0 256MB Athlon 1200 MHz 2741 30 -29 574 64% 2636 >>>2 Shredder 6.0 Paderb 256MB Athlon 1200 2727 34 -32 467 65% 2619 >>>3 Chess Tiger 14.0 CB 256MB Athlon 1200 2721 33 -32 487 63% 2627 >>>4 Gambit Tiger 2.0 256MB Athlon 1200 2718 31 -30 523 60% 2645 >>>5 Shredder 6.0 256MB Athlon 1200 MHz 2717 32 -31 505 64% 2618 >>> >>> >>>The games are not played against same opponent (if so, not same number). >>> >>>For examples, if shredder6 plays another 107 (which is the number of games less >>>than Fritz7) games against opponent such as, Crafty and older versions of Fritz >>>and lower rated programs, it probably will not only shorten the gap between the >>>first and second program but shredder6 will most likely top fritz7 easily. >> >>I don't get that. The programs are lower rated and Shredder6 have to get enough >>good results (better than the ratings are saying) in order to shorten the gap. >>Do you know that this is the case? > >What I am saying is that, since the games are being played randomly and >opponents are random. >IF Shredder6 is to play another 107 games (which is the differance between the >number of games played by player1 and player2) with lower rated players, and >Fritz7 is to just sit there and play no games (since it already played more >games), chances are (very much) that Shredder6 would cross Fritz7. Something is wrong here (if you aren't thinking of some specific players). Shredder6 has to get relatively much better results vs lower rated players than vs equal players in order to raise his rating. I can't see why it's easier to raise your rating vs lower rated players than others. >>> >>>It is also true for every other program in the list. >>> >>>Most people don't even look at the error bar, even so, with random number games >>>with opponent of differant strength, error bar has little credibility. >>> >>>Playing equal number of games, most likely, will end up being more precise. >>> >>>And since you are playing 1000 of games anyways, why not play equal numbers of >>>games for all program against the same opponents? >>>Just a thought >> >>That is is a possible set up but most important is to play a wide range of >>opponents and enough games in total. >>The effects that you are afraid of can bias the rating if there are only few >>opponents but with enough number of opponents those effects are vanishing. But >>of course the more unbalanced the number of games with different opponents are, >>the higher is the risk of biased ratings. Do some tests with lets say a >>population of 50 where each player meets 10 opponents with 20-50 games each. >>Take some result lets say 15-5 and make that 37.5-12.5. The rating list will not >>change a bit. Maybe some point here and there. > >Another point to note is that strength of the players. >If program A gets to play more weaker player (and more games) than program B, >the list won't make sense much than. The risk I can see here is if the gap is __very__ wide. The ELO formula doesn't make much sense in those cases. SSDF avoids such matches. >>One of the main ideas behind the ELO system is just the possibility to meet >>different players and yet have comparable ratings. Otherwise the human ELO list >>would not usable at all, given the number of games played and that they have not >>met the same opponents. > >I don't think that human rating is "realistic". >I don't think Kasparov is a 2800+ rated player, for what it's worth. ...and I have no opinion really... :-) Mvh Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.