Computer Chess Club Archives


Search

Terms

Messages

Subject: What a Elo number means in a Test Like SSDF

Author: Rolf Tueschen

Date: 05:23:22 06/11/04

Go up one level in this thread


On June 11, 2004 at 08:04:02, Tony Werten wrote:

>On June 09, 2004 at 20:24:52, Dann Corbit wrote:
>
>>On June 09, 2004 at 19:27:37, Derek Paquette wrote:
>>
>>>On June 09, 2004 at 19:23:11, Dann Corbit wrote:
>>>
>>>>On June 09, 2004 at 19:07:39, Derek Paquette wrote:
>>>>
>>>>>On June 09, 2004 at 18:49:40, Jorge Pichard wrote:
>>>>>
>>>>>>Taking on a 3400+ AMD 64 with 2 GB RAM and Fritz 8
>>>>>>http://www.chessbase.com/newsdetail.asp?newsid=1703
>>>>>
>>>>>this is very annoying for someone who is a chess enthusiast like myself.
>>>>>
>>>>>why would the company that is marketting this laptop, RISK using a program that
>>>>>is 40 elo LOWER?
>>>>>i just dont' get it,
>>>>>i think it comes down to plain old ignorance of chess programs
>>>>>why NOT use shredder 8?
>>>>>this is very frusterating, because we never get to see shredder 8 in action vs
>>>>>grandmasters at tournament time controls.
>>>>
>>>>Probably, they have a good reason.
>>>>For instance, they might take 7.04 and analyze every game she has every played
>>>>at very slow time control.  Now, they have a database and expected response for
>>>>most of the moves she is likely to make.
>>>>
>>>>Perhaps the analysis started long ago.  They know for sure exactly how it would
>>>>work with 7.04
>>>>
>>>>Bleeding edge is not always the best thing, if you want a reliable outcome.
>>>>For the same reason, we won't always see the fastest possible hardware.  It
>>>>could be that the fastest stuff has not been tested.  It would be a mistake to
>>>>try an untested system.
>>>
>>>that is very true, if shredder 8 was released last week, HOWEVER,
>>>shredder 8 has been released long enough for the following to happen,
>>>SSDF has had enough time to test it
>>>ICC is full of shredder 8 (and it turning humans into mince meat)
>>>
>>>that is enough to say that the program is well tested, and that it would kick
>>>the crap out of a human, because its certainly beating around fritz 8.
>>
>>It it not known whether Fritz 8 would do better against humans than Shredder 8.
>>
>>We might surmise it from SSDF and WMCCC results, but that is really an
>>extrapolation that may not be correct.
>>
>>At any rate, even the SSDF Elo strength rating also does not decide who is
>>stronger:
>>
>>      THE SSDF RATING LIST 2004-04-22   97872 games played by  264 computers
>>                                           Rating   +     -  Games   Won  Oppo
>>                                           ------  ---   --- -----   ---  ----
>>   1 Shredder 8.0 CB  256MB Athlon 1200 MHz  2818   34   -32   481   70%  2673
>>   2 Shredder 7.04 UCI 256MB Athlon 1200 MHz 2809   24   -23   967   71%  2648
>>   3 Deep Fritz 8.0  256MB Athlon 1200 MHz   2790   26   -25   855   72%  2625
>>
>>2818 - 32 = 2786
>>2790 + 26 = 2816
>
>1+1=2, also true and also irrelevant :)
>
>The fact that the 2 numbers overlap doesn't meant Shredder isn't stronger.
>
>It says something about the uncertainty that Shredder is stronger.


No, it doesn't. Corbit is quite right with his note. It means the uncertainty of
the found Elo number for Shredder, not already the uncertainty of being better
in regard to Fritz. The difference in the interpretation is not trivial. The
instable Elo result for an engine means uncertainty in our knowledge. Testtheory
says that a result is most uncertain with such a high deviation. You miss the
meaning of deviations. High deviations mean that the final (true) result can be
everywhere in the region of the actual deviation! You seem to think that the
(true) result is already known but in the extreme regions there is still a white
zone of uncertainty. That is wrong. The uncertainty is directly related to the
_main_ result in the middle of your nicely put mountains below - where you
assume certainty IMO. But exactly this is wrong.



>
>In this case I guestimate that the correct expression would be: Shredder is
>stronger than Fritz with a 85% certainty.
>
>ie:
>
>                o
>     xx        ooo
>    xxxx      ooooo
>   xxxxxx    ooooooo
>  xxxxxxxx  ooooooooo
>xxxxxxxxxx??ooooooooooo
>
>x+? = uncertainty about elo Fritz
>o+? = uncertainty about elo Shredder
>? = chance that Fritz is stronger than Shredder
>o = chance that Shredder is stronger than Fritz
>
>Tony



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.