Author: Peter Fendrich
Date: 13:57:06 06/12/04
Go up one level in this thread
On June 11, 2004 at 23:35:39, Dann Corbit wrote: >On June 11, 2004 at 09:54:54, Peter Fendrich wrote: > >>On June 09, 2004 at 20:24:52, Dann Corbit wrote: >> >>>On June 09, 2004 at 19:27:37, Derek Paquette wrote: >>> >>>>On June 09, 2004 at 19:23:11, Dann Corbit wrote: >>>> >>>>>On June 09, 2004 at 19:07:39, Derek Paquette wrote: >>>>> >>>>>>On June 09, 2004 at 18:49:40, Jorge Pichard wrote: >>>>>> >>>>>>>Taking on a 3400+ AMD 64 with 2 GB RAM and Fritz 8 >>>>>>>http://www.chessbase.com/newsdetail.asp?newsid=1703 >>>>>> >>>>>>this is very annoying for someone who is a chess enthusiast like myself. >>>>>> >>>>>>why would the company that is marketting this laptop, RISK using a program that >>>>>>is 40 elo LOWER? >>>>>>i just dont' get it, >>>>>>i think it comes down to plain old ignorance of chess programs >>>>>>why NOT use shredder 8? >>>>>>this is very frusterating, because we never get to see shredder 8 in action vs >>>>>>grandmasters at tournament time controls. >>>>> >>>>>Probably, they have a good reason. >>>>>For instance, they might take 7.04 and analyze every game she has every played >>>>>at very slow time control. Now, they have a database and expected response for >>>>>most of the moves she is likely to make. >>>>> >>>>>Perhaps the analysis started long ago. They know for sure exactly how it would >>>>>work with 7.04 >>>>> >>>>>Bleeding edge is not always the best thing, if you want a reliable outcome. >>>>>For the same reason, we won't always see the fastest possible hardware. It >>>>>could be that the fastest stuff has not been tested. It would be a mistake to >>>>>try an untested system. >>>> >>>>that is very true, if shredder 8 was released last week, HOWEVER, >>>>shredder 8 has been released long enough for the following to happen, >>>>SSDF has had enough time to test it >>>>ICC is full of shredder 8 (and it turning humans into mince meat) >>>> >>>>that is enough to say that the program is well tested, and that it would kick >>>>the crap out of a human, because its certainly beating around fritz 8. >>> >>>It it not known whether Fritz 8 would do better against humans than Shredder 8. >>> >>>We might surmise it from SSDF and WMCCC results, but that is really an >>>extrapolation that may not be correct. >> >>I agree to 100%. It's an extrapolation - only experience can tell if it's right. >> >>> >>>At any rate, even the SSDF Elo strength rating also does not decide who is >>>stronger: >> >>This is not the right way to interpret the table. I should know as I once >>designed that table :-) >>First: The ratings 2818 for Schredder and 2790 for Deep Fritz are their ratings >>to the best of our knowledge, given the information we have from results. That >>is the best we can say, regardless of confidence. > >I think "x +/- y" is a better way to say it. No, that's wrong. The rating is well defined exactly as one value. No fuzziness at all even if it will change when you add new games. The "real" rating (we haven't even defined what it is here) will we never know, but it's exactly one value that never change. The interval was invented by me and is not used by Arpad Elo. It is not exactly the same thing to claim that the there is 95% prob that the interval is covering the "real" rating and that the "real" rating is x +/- y Think about it, the interval is jumping around depending on the games but the "real" rating is sitting still. >Either that, or round to 2 digits >of accuracy (which is about what is present). It's not like a measurement of >the height of a tree or the weight of a metal mass. Yes, it is by definition! >It's a broad sample of data >from a collection which we know will experience a lot of randomness. > >>Second: The interval is another story. We don't know the real rating point. The >>interval [2786,2852] for Shredder is covering the real point with a confidence >>of 95% given the information we have. >>To add and subtract the ratings for different individuals to find out if we have >>an overlap is not the right way to go. If they overlap we can't say anything >>about where the two real ratings are placed without doing some more math. If the >>interval from one of them is covering the estimated rating of the other as it >>does in this case. 2786 is less than 2790 we could probably make some kind of >>statement. >>/Peter > >My point was that: >1. The ratings are fuzzy numbers, not numbers with 4 digits of precision. >2. The rating "area of fuzziness" overlaps for the two programs. That means it >is reasonable to say that it is not proven which is stronger. That is NEVER proven regardless of number of games. An overlap by two 95%-areas can't easily be translated to anything meaningful. >In LIKELIHOOD, the higher program is probably stronger than the lower program. > >If someone saw: >Program x 2739.0123 >Program y 2699.7865 > >Those may be the absolute x-bar ratings. > >But if only 10 games were played, only the first digit has any meaning. > >So the error bars say as much or more about the meaning of the ratings than the >average itself does. Well after 10 games you can't even rely on the accuracy of error bars and shouldn't use them (based on the bell curve) but the rating is well defined as one value. "x's rating after 10 games is 2739" is a correct statement. /Peter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.