Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Can someone here enhance the next SSDF list please.

Author: Eelco de Groot

Date: 18:24:48 11/19/99

Go up one level in this thread



On November 19, 1999 at 20:06:31, Tina Long wrote:

>The results of what I am asking could be badly misinterpreted, & could result in
>silly arguements, but if read properly would, for many here, be very
>interesting.
>
>In the discussions of "Who's best" there is rarely any consideration of the +/-
>in the SSDF list, we get statements such as
>"ProgramX is best; it's 5 points ahead of the rest."
>
>Now this is poetic, but wrong, as ProgramX's result is 2680 +/- 70,
>From the games played we can be 95% sure ProgramX is rated somewhere between
>2610 and 2750.
>
>This is not ELO, this is the progression of computers vs computers since some
>computers played some humans about 20 years ago.  The whole list was "deflated"
>by 100 points about 10 years ago, and looks like it should be deflated by
>another 100 points now.  The only real relationship to ELO we currently have is
>Rebel's small sample of Computer Human games, and as Rebel is constantly being
>improved we don't know it's current rating as the rating is biased by the
>"older" Rebel results- but that's a tangent.... sorry
>
>I'll get to the point:
>When the next SSDF is release at the end of November, I'd like one of the
>smarter maths whizes here to do the following calculations for me:
>
>Using:
>What's the improvement in rating in going from a 200mhz to a 450mhz?
>(Looking at the last list, it's about 70 +/- 30)
>Ditto from 486/50 and P90 to 200 or 450?
>
>Create a list of estimated ratings on a unified platform, combining (where
>applicable) the games of ProgramX on multiple platforms (many programs have been
>tested on 2 mhz levels).  The +/- needs to be stated as well as this will
>increase dramatically, particularly for ProgramY currently ranked on P90 or a
>486/50.
>
>(And where would my favourite oldie
>129 Mephisto Polgar  6502 5 MHz             1970   17  1793   41%  2036
>rank when upgraded (remembering a P450 is probably 300 - not 100 - times faster)
>2600 +- 1000 ?)
>
>Maybe deflating the 450's and using P200 as the unified platform would be best
>at this time.
>
>I realise the results would actually mean little due to the very high
>statistical variance in the results, but I would still find it an interesting
>ranking.
>
>Any volunteers to do the sums?
>Thanks guys
>
>Tina Long

Hi Tina,

I'm not very keen on volunteering for that particular job here, Tina. I don't
think that is going to work very well because some programs profit more from
going to another, faster platform, than others? Also you already mentioned the
rating inflation of which we can not be sure how large it is, but which I think
could well be somewhere between 100 and 200 points. There is no reference by
playing a gainst humans anymore and results against slower computers are
exaggerated because of the "search gap".  I'm sure Chris would agree with me
because he has brought this up in the past on Gambitsoft Forum  too. I would
like to quote him but I don't dare... Has this quoting issue been discussed yet
by the programmers?  Also the confidence interval is only a statistical thing,
it doesn't take learning into account for instance. And then I'm not even
talking about the reliability of AUTO232 results which introduces possibly
systematic errors, errors that are possibly larger for some programs.

No math whiz anyway but statistics can be fun ,don't you think?
Eelco



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.