Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: What a Elo number means in a Test Like SSDF

Author: Tony Werten

Date: 11:19:53 06/11/04

Go up one level in this thread


On June 11, 2004 at 08:23:22, Rolf Tueschen wrote:

>On June 11, 2004 at 08:04:02, Tony Werten wrote:
>
>>On June 09, 2004 at 20:24:52, Dann Corbit wrote:
>>
>>>On June 09, 2004 at 19:27:37, Derek Paquette wrote:
>>>
>>>>On June 09, 2004 at 19:23:11, Dann Corbit wrote:
>>>>
>>>>>On June 09, 2004 at 19:07:39, Derek Paquette wrote:
>>>>>
>>>>>>On June 09, 2004 at 18:49:40, Jorge Pichard wrote:
>>>>>>
>>>>>>>Taking on a 3400+ AMD 64 with 2 GB RAM and Fritz 8
>>>>>>>http://www.chessbase.com/newsdetail.asp?newsid=1703
>>>>>>
>>>>>>this is very annoying for someone who is a chess enthusiast like myself.
>>>>>>
>>>>>>why would the company that is marketting this laptop, RISK using a program that
>>>>>>is 40 elo LOWER?
>>>>>>i just dont' get it,
>>>>>>i think it comes down to plain old ignorance of chess programs
>>>>>>why NOT use shredder 8?
>>>>>>this is very frusterating, because we never get to see shredder 8 in action vs
>>>>>>grandmasters at tournament time controls.
>>>>>
>>>>>Probably, they have a good reason.
>>>>>For instance, they might take 7.04 and analyze every game she has every played
>>>>>at very slow time control.  Now, they have a database and expected response for
>>>>>most of the moves she is likely to make.
>>>>>
>>>>>Perhaps the analysis started long ago.  They know for sure exactly how it would
>>>>>work with 7.04
>>>>>
>>>>>Bleeding edge is not always the best thing, if you want a reliable outcome.
>>>>>For the same reason, we won't always see the fastest possible hardware.  It
>>>>>could be that the fastest stuff has not been tested.  It would be a mistake to
>>>>>try an untested system.
>>>>
>>>>that is very true, if shredder 8 was released last week, HOWEVER,
>>>>shredder 8 has been released long enough for the following to happen,
>>>>SSDF has had enough time to test it
>>>>ICC is full of shredder 8 (and it turning humans into mince meat)
>>>>
>>>>that is enough to say that the program is well tested, and that it would kick
>>>>the crap out of a human, because its certainly beating around fritz 8.
>>>
>>>It it not known whether Fritz 8 would do better against humans than Shredder 8.
>>>
>>>We might surmise it from SSDF and WMCCC results, but that is really an
>>>extrapolation that may not be correct.
>>>
>>>At any rate, even the SSDF Elo strength rating also does not decide who is
>>>stronger:
>>>
>>>      THE SSDF RATING LIST 2004-04-22   97872 games played by  264 computers
>>>                                           Rating   +     -  Games   Won  Oppo
>>>                                           ------  ---   --- -----   ---  ----
>>>   1 Shredder 8.0 CB  256MB Athlon 1200 MHz  2818   34   -32   481   70%  2673
>>>   2 Shredder 7.04 UCI 256MB Athlon 1200 MHz 2809   24   -23   967   71%  2648
>>>   3 Deep Fritz 8.0  256MB Athlon 1200 MHz   2790   26   -25   855   72%  2625
>>>
>>>2818 - 32 = 2786
>>>2790 + 26 = 2816
>>
>>1+1=2, also true and also irrelevant :)
>>
>>The fact that the 2 numbers overlap doesn't meant Shredder isn't stronger.
>>
>>It says something about the uncertainty that Shredder is stronger.
>
>
>No, it doesn't. Corbit is quite right with his note. It means the uncertainty of
>the found Elo number for Shredder, not already the uncertainty of being better
>in regard to Fritz.

Read again. Your comment doesn't make sense. The fact that the numbers overlap
says something about the elo of Shredder ?

Depending on wich k factor is used, the elo of Shredder is, with 95% certainty
within the 2818-32,2818+34 range, Fritz' elo within 2790-25,2790+26 ( where 2k
is a 95% certainty)


The fact that they overlap says something about the chance that 1 engine is
better than the other, wich is directly related to the standard deviation

>The difference in the interpretation is not trivial. The
>instable Elo result for an engine means uncertainty in our knowledge. Testtheory
>says that a result is most uncertain with such a high deviation.

No it doesn't. It means that for a 95% certainty is you have to take a big range
around the mean.


>You miss the
>meaning of deviations. High deviations mean that the final (true) result can be
>everywhere in the region of the actual deviation!

Nope :) The true result can be anything. The standard deviation just says
something about the chance it can be a certain number.

>You seem to think that the
>(true) result is already known but in the extreme regions there is still a white
>zone of uncertainty.

You don't seem to understand. Maybe you should try to understand it first,
before you start to correct someone who does this kind of calculations for his
work ;)

Tony

>That is wrong. The uncertainty is directly related to the
>_main_ result in the middle of your nicely put mountains below - where you
>assume certainty IMO. But exactly this is wrong.
>
>
>
>>
>>In this case I guestimate that the correct expression would be: Shredder is
>>stronger than Fritz with a 85% certainty.
>>
>>ie:
>>
>>                o
>>     xx        ooo
>>    xxxx      ooooo
>>   xxxxxx    ooooooo
>>  xxxxxxxx  ooooooooo
>>xxxxxxxxxx??ooooooooooo
>>
>>x+? = uncertainty about elo Fritz
>>o+? = uncertainty about elo Shredder
>>? = chance that Fritz is stronger than Shredder
>>o = chance that Shredder is stronger than Fritz
>>
>>Tony



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.