Author: Chris Carson
Date: 10:59:08 01/05/00
Go up one level in this thread
On January 05, 2000 at 13:25:12, Robert Hyatt wrote: >On January 05, 2000 at 10:53:50, Bertil Eklund wrote: > >>On January 05, 2000 at 09:45:04, Chris Carson wrote: >> >>>For ELO measurements (FIDE, PCA, SSDF or combined). Would a computer >>>(or perhaps a person) get a higher rating in a tournament than in >>>a match? >>> >>>My opinion is that a tournament is a better predictor of strength >>>than a match. My reason (not based on any facts, it would be an >>>interesting study) is that in a tournament a person (or machine) would >>>face a broader range of styles than in a match. In a match, the person >>>or computer might face an opponent that just plain does well against >>>him/her/it (Even Fisher had a nimises). Also, in match play, each >>>player can book up on the opponent and may get an advantage that might >>>not be there in a tournament (more players to worry about). >>> >>>So, I think a tournament is a better measure of strength than a match. >>> >>>Second question: Would computer ratings benifit more from tournament >>>play than match play? I vote that tournament play would produce higher >>>(more accurate) ratings for computers against people than match play. >>> >>>Just my two cents. :) >>> >>>Best Regards, >>>Chris Carson >>Hi! >> >>You are right humans plays a lot better in single game matches and that is the >>main reason between the discrepance between the SSDF-list and these matches >>often with increment or double-increment time-controls. >> >>Regards Bertil SSDF > > >Here I still disagree. The SSDF list is simply grossly inflated. Programs are >not playing at a 2700 level, if by 2700 the word "FIDE" comes to mind. The lack >of human competition over the last 7-8 years has caused this, as >machine-vs-machine ratings tend to get exaggerated. I can't count the number of >times I have made small changes to crafty that would cause version N+1 to beat >version N by a 60-40 margin, yet the rating remained _exactly_ the same on ICC. > >Most versions will beat the earlier versions by significant margins, yet the >overall skill level gain (against humans) is lower than what is suggested by >taking the win/lose/draw score and running it thru the Elo formula. > >As I have said before, the pools are totally different. The ratings are not >comparable in any fashion until the two pools of players are merged and mingled >enough that they can be treated equally. Bob, You know a lot more about this than I do. I have a lot of respect for your opinion on this (as well as Bertil). I do not mean this to be an attack. :) My opinion: If the top programs played in a series of GM tournaments (not matches), they would score 95% of the time in the range of their SSDF ratings (plus or minus two error of measures). Given (SSDF ratings): 1 Tiger 12.0 128MB K6-2 450 MHz 2696 44 -40 317 72% 2533 2 Fritz 5.32 128MB K6-2 450 MHz 2671 45 -41 297 72% 2506 3 Nimzo 7.32 128MB K6-2 450 MHz 2663 37 -35 409 69% 2526 5 Hiarcs 7.32 128MB K6-2 450 MHz 2636 42 -39 320 67% 2509 6 Junior 5.0 128MB K6-2 450 MHz 2619 54 -50 190 65% 2508 Note: I only include one version of Nimzo (the highest rated). The top expected performance (in my opinion): 2696+44+44= 2784 The low expected performance (in my opinion): 2619-50-50= 2519 This means that I am 95% confident that the performance of the above programs in GM tournaments would fall between 2519 and 2784. I would expect a performance out side this range (lower or higher) about 5% of the time. :) So I guess that a performance of 2519 would not surprise me, nor would a performance of 2784. A performance of 2419 (two more error of measures lower) or a performance of 2872 (two more error of measures higher) would surprise me. Just my thoughts. :) Best Regards, Chris Carson
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.