Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Poll Question - Tournaments vs Matches

Author: Robert Hyatt
Date: 13:34:09 01/05/00
On January 05, 2000 at 13:59:08, Chris Carson wrote:

>On January 05, 2000 at 13:25:12, Robert Hyatt wrote:
>
>>On January 05, 2000 at 10:53:50, Bertil Eklund wrote:
>>
>>>On January 05, 2000 at 09:45:04, Chris Carson wrote:
>>>
>>>>For ELO measurements (FIDE, PCA, SSDF or combined).  Would a computer
>>>>(or perhaps a person) get a higher rating in a tournament than in
>>>>a match?
>>>>
>>>>My opinion is that a tournament is a better predictor of strength
>>>>than a match.  My reason (not based on any facts, it would be an
>>>>interesting study) is that in a tournament a person (or machine) would
>>>>face a broader range of styles than in a match.  In a match, the person
>>>>or computer might face an opponent that just plain does well against
>>>>him/her/it (Even Fisher had a nimises).  Also, in match play, each
>>>>player can book up on the opponent and may get an advantage that might
>>>>not be there in a tournament (more players to worry about).
>>>>
>>>>So, I think a tournament is a better measure of strength than a match.
>>>>
>>>>Second question:  Would computer ratings benifit more from tournament
>>>>play than match play?  I vote that tournament play would produce higher
>>>>(more accurate) ratings for computers against people than match play.
>>>>
>>>>Just my two cents.  :)
>>>>
>>>>Best Regards,
>>>>Chris Carson
>>>Hi!
>>>
>>>You are right humans plays a lot better in single game matches and that is the
>>>main reason between the discrepance between the SSDF-list and these matches
>>>often with increment or double-increment time-controls.
>>>
>>>Regards Bertil SSDF
>>
>>
>>Here I still disagree.  The SSDF list is simply grossly inflated.  Programs are
>>not playing at a 2700 level, if by 2700 the word "FIDE" comes to mind.  The lack
>>of human competition over the last 7-8 years has caused this, as
>>machine-vs-machine ratings tend to get exaggerated.  I can't count the number of
>>times I have made small changes to crafty that would cause version N+1 to beat
>>version N by a 60-40 margin, yet the rating remained _exactly_ the same on ICC.
>>
>>Most versions will beat the earlier versions by significant margins, yet the
>>overall skill level gain (against humans) is lower than what is suggested by
>>taking the win/lose/draw score and running it thru the Elo formula.
>>
>>As I have said before, the pools are totally different.  The ratings are not
>>comparable in any fashion until the two pools of players are merged and mingled
>>enough that they can be treated equally.
>
>Bob,
>
>You know a lot more about this than I do.  I have a lot of respect for
>your opinion on this (as well as Bertil). I do not mean this to be an
>attack.  :)
>
>My opinion: If the top programs played in a series of GM tournaments
>(not matches), they would score 95% of the time in the range of their
>SSDF ratings (plus or minus two error of measures).
>
>Given (SSDF ratings):
>1 Tiger 12.0  128MB K6-2 450 MHz    2696   44   -40   317   72%  2533
>2 Fritz 5.32  128MB K6-2 450 MHz    2671   45   -41   297   72%  2506
>3 Nimzo 7.32  128MB K6-2 450 MHz    2663   37   -35   409   69%  2526
>5 Hiarcs 7.32 128MB K6-2 450 MHz    2636   42   -39   320   67%  2509
>6 Junior 5.0  128MB K6-2 450 MHz    2619   54   -50   190   65%  2508
>
>Note: I only include one version of Nimzo (the highest rated).


that is a tad higher than I would expect, but within the margin of error I
would consider reasonable.  But remember, Tiger is almost 2700 on the SSDF
list.  You are saying it is almost 200 points too high.  I agree.




>
>The top expected performance (in my opinion): 2696+44+44= 2784
>The low expected performance (in my opinion): 2619-50-50= 2519
>
>This means that I am 95% confident that the performance of
>the above programs in GM tournaments would fall between 2519 and 2784.
>I would expect a performance out side this range (lower or higher)
>about 5% of the time.  :)
>
>So I guess that a performance of 2519 would not surprise me, nor would
>a performance of 2784.  A performance of 2419 (two more error of measures
>lower) or a performance of 2872 (two more error of measures higher)
>would surprise me.
>
>Just my thoughts.  :)
>
>Best Regards,
>Chris Carson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.