Author: Robert Hyatt
Date: 13:34:09 01/05/00
Go up one level in this thread
On January 05, 2000 at 13:59:08, Chris Carson wrote: >On January 05, 2000 at 13:25:12, Robert Hyatt wrote: > >>On January 05, 2000 at 10:53:50, Bertil Eklund wrote: >> >>>On January 05, 2000 at 09:45:04, Chris Carson wrote: >>> >>>>For ELO measurements (FIDE, PCA, SSDF or combined). Would a computer >>>>(or perhaps a person) get a higher rating in a tournament than in >>>>a match? >>>> >>>>My opinion is that a tournament is a better predictor of strength >>>>than a match. My reason (not based on any facts, it would be an >>>>interesting study) is that in a tournament a person (or machine) would >>>>face a broader range of styles than in a match. In a match, the person >>>>or computer might face an opponent that just plain does well against >>>>him/her/it (Even Fisher had a nimises). Also, in match play, each >>>>player can book up on the opponent and may get an advantage that might >>>>not be there in a tournament (more players to worry about). >>>> >>>>So, I think a tournament is a better measure of strength than a match. >>>> >>>>Second question: Would computer ratings benifit more from tournament >>>>play than match play? I vote that tournament play would produce higher >>>>(more accurate) ratings for computers against people than match play. >>>> >>>>Just my two cents. :) >>>> >>>>Best Regards, >>>>Chris Carson >>>Hi! >>> >>>You are right humans plays a lot better in single game matches and that is the >>>main reason between the discrepance between the SSDF-list and these matches >>>often with increment or double-increment time-controls. >>> >>>Regards Bertil SSDF >> >> >>Here I still disagree. The SSDF list is simply grossly inflated. Programs are >>not playing at a 2700 level, if by 2700 the word "FIDE" comes to mind. The lack >>of human competition over the last 7-8 years has caused this, as >>machine-vs-machine ratings tend to get exaggerated. I can't count the number of >>times I have made small changes to crafty that would cause version N+1 to beat >>version N by a 60-40 margin, yet the rating remained _exactly_ the same on ICC. >> >>Most versions will beat the earlier versions by significant margins, yet the >>overall skill level gain (against humans) is lower than what is suggested by >>taking the win/lose/draw score and running it thru the Elo formula. >> >>As I have said before, the pools are totally different. The ratings are not >>comparable in any fashion until the two pools of players are merged and mingled >>enough that they can be treated equally. > >Bob, > >You know a lot more about this than I do. I have a lot of respect for >your opinion on this (as well as Bertil). I do not mean this to be an >attack. :) > >My opinion: If the top programs played in a series of GM tournaments >(not matches), they would score 95% of the time in the range of their >SSDF ratings (plus or minus two error of measures). > >Given (SSDF ratings): >1 Tiger 12.0 128MB K6-2 450 MHz 2696 44 -40 317 72% 2533 >2 Fritz 5.32 128MB K6-2 450 MHz 2671 45 -41 297 72% 2506 >3 Nimzo 7.32 128MB K6-2 450 MHz 2663 37 -35 409 69% 2526 >5 Hiarcs 7.32 128MB K6-2 450 MHz 2636 42 -39 320 67% 2509 >6 Junior 5.0 128MB K6-2 450 MHz 2619 54 -50 190 65% 2508 > >Note: I only include one version of Nimzo (the highest rated). that is a tad higher than I would expect, but within the margin of error I would consider reasonable. But remember, Tiger is almost 2700 on the SSDF list. You are saying it is almost 200 points too high. I agree. > >The top expected performance (in my opinion): 2696+44+44= 2784 >The low expected performance (in my opinion): 2619-50-50= 2519 > >This means that I am 95% confident that the performance of >the above programs in GM tournaments would fall between 2519 and 2784. >I would expect a performance out side this range (lower or higher) >about 5% of the time. :) > >So I guess that a performance of 2519 would not surprise me, nor would >a performance of 2784. A performance of 2419 (two more error of measures >lower) or a performance of 2872 (two more error of measures higher) >would surprise me. > >Just my thoughts. :) > >Best Regards, >Chris Carson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.