Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This Super Laptop with Fritz 8 would even beat Judith Polgar!

Author: Dann Corbit

Date: 10:38:56 06/15/04

Go up one level in this thread


On June 12, 2004 at 16:57:06, Peter Fendrich wrote:
>On June 11, 2004 at 23:35:39, Dann Corbit wrote:
>>On June 11, 2004 at 09:54:54, Peter Fendrich wrote:
>>>On June 09, 2004 at 20:24:52, Dann Corbit wrote:
>>>>On June 09, 2004 at 19:27:37, Derek Paquette wrote:
>>>>>On June 09, 2004 at 19:23:11, Dann Corbit wrote:
>>>>>>On June 09, 2004 at 19:07:39, Derek Paquette wrote:
>>>>>>>On June 09, 2004 at 18:49:40, Jorge Pichard wrote:
>>>>>>>>Taking on a 3400+ AMD 64 with 2 GB RAM and Fritz 8
>>>>>>>>http://www.chessbase.com/newsdetail.asp?newsid=1703
>>>>>>>this is very annoying for someone who is a chess enthusiast like myself.
>>>>>>>why would the company that is marketting this laptop, RISK using a program that
>>>>>>>is 40 elo LOWER?
>>>>>>>i just dont' get it,
>>>>>>>i think it comes down to plain old ignorance of chess programs
>>>>>>>why NOT use shredder 8?
>>>>>>>this is very frusterating, because we never get to see shredder 8 in action vs
>>>>>>>grandmasters at tournament time controls.
>>>>>>
>>>>>>Probably, they have a good reason.
>>>>>>For instance, they might take 7.04 and analyze every game she has every played
>>>>>>at very slow time control.  Now, they have a database and expected response for
>>>>>>most of the moves she is likely to make.
>>>>>>
>>>>>>Perhaps the analysis started long ago.  They know for sure exactly how it would
>>>>>>work with 7.04
>>>>>>
>>>>>>Bleeding edge is not always the best thing, if you want a reliable outcome.
>>>>>>For the same reason, we won't always see the fastest possible hardware.  It
>>>>>>could be that the fastest stuff has not been tested.  It would be a mistake to
>>>>>>try an untested system.
>>>>>
>>>>>that is very true, if shredder 8 was released last week, HOWEVER,
>>>>>shredder 8 has been released long enough for the following to happen,
>>>>>SSDF has had enough time to test it
>>>>>ICC is full of shredder 8 (and it turning humans into mince meat)
>>>>>
>>>>>that is enough to say that the program is well tested, and that it would kick
>>>>>the crap out of a human, because its certainly beating around fritz 8.
>>>>
>>>>It it not known whether Fritz 8 would do better against humans than Shredder 8.
>>>>
>>>>We might surmise it from SSDF and WMCCC results, but that is really an
>>>>extrapolation that may not be correct.
>>>
>>>I agree to 100%. It's an extrapolation - only experience can tell if it's right.
>>>
>>>>
>>>>At any rate, even the SSDF Elo strength rating also does not decide who is
>>>>stronger:
>>>
>>>This is not the right way to interpret the table. I should know as I once
>>>designed that table :-)
>>>First: The ratings 2818 for Schredder and 2790 for Deep Fritz are their ratings
>>>to the best of our knowledge, given the information we have from results. That
>>>is the best we can say, regardless of confidence.
>>
>>I think "x +/- y" is a better way to say it.
>
>No, that's wrong. The rating is well defined exactly as one value. No fuzziness
>at all even if it will change when you add new games. The "real" rating (we
>haven't even defined what it is here) will we never know, but it's exactly one
>value that never change.
>The interval was invented by me and is not used by Arpad Elo.
>It is not exactly the same thing to claim that the there is 95% prob that the
>interval is covering the "real" rating and that the "real" rating is x +/- y
>Think about it, the interval is jumping around depending on the games but the
>"real" rating is sitting still.
>
>>Either that, or round to 2 digits
>>of accuracy (which is about what is present).  It's not like a measurement of
>>the height of a tree or the weight of a metal mass.
>
>Yes, it is by definition!

You can define it to be the number calculated.
I can calculate the Elo of Program x as 2350.6932471 against  a pool of 4
programs after 16 games.

The real number is somewhere between 1000 and 3000.  Even the first digit has no
significance.

My point is that the digits after the 2 have no significance whatsoever.
As more and more games are built up, more and more digits have meaning.  If we
were to play one trillion games, I could get perhaps 8-9 significant digits.

>>It's a broad sample of data
>>from a collection which we know will experience a lot of randomness.
>>
>>>Second: The interval is another story. We don't know the real rating point. The
>>>interval [2786,2852] for Shredder is covering the real point with a confidence
>>>of 95% given the information we have.
>>>To add and subtract the ratings for different individuals to find out if we have
>>>an overlap is not the right way to go. If they overlap we can't say anything
>>>about where the two real ratings are placed without doing some more math. If the
>>>interval from one of them is covering the estimated rating of the other as it
>>>does in this case. 2786 is less than 2790 we could probably make some kind of
>>>statement.
>>>/Peter
>>
>>My point was that:
>>1.  The ratings are fuzzy numbers, not numbers with 4 digits of precision.
>>2.  The rating "area of fuzziness" overlaps for the two programs.  That means it
>>is reasonable to say that it is not proven which is stronger.
>
>That is NEVER proven regardless of number of games.
>An overlap by two 95%-areas can't easily be translated to anything meaningful.

If you have played 100,000 games and the Elo of program A is 2500 +/- 1 and the
Elo of program b is 2400 +/- 1, then A is stronger than B with a probability
very nearly 1.0.  In other words, we can be more sure that A is stronger than B,
than the chance that the light will come on when we turn the switch (power might
be off, it might burn out at the moment of the switch, the switch could go
defective, a rat could chew the wires...)

>>In LIKELIHOOD, the higher program is probably stronger than the lower program.
>>
>>If someone saw:
>>Program x 2739.0123
>>Program y 2699.7865
>>
>>Those may be the absolute x-bar ratings.
>>
>>But if only 10 games were played, only the first digit has any meaning.
>>
>>So the error bars say as much or more about the meaning of the ratings than the
>>average itself does.
>
>Well after 10 games you can't even rely on the accuracy of error bars and
>shouldn't use them (based on the bell curve) but the rating is well defined as
>one value. "x's rating after 10 games is 2739" is a correct statement.

That is misleading and very bad science.
Why not say that the rating is
2739.8356245494183672715153891736273563
?
Even though you are not even sure about the leading 2.

>/Peter



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.