Author: Dann Corbit
Date: 00:30:53 11/20/99
Go up one level in this thread
On November 20, 1999 at 02:42:03, blass uri wrote: >On November 20, 1999 at 00:37:32, Wayne Lowrance wrote: ><snipped> >>At the moment, for me, the only gauge is SSDF. Its results seem to stand up >>preety good. SSDF has said for a few years now that Fritz was the strongest >>program > >The ssdf did not say it because they did not test all the programs. >The ssdf is saying now that chessmaster6000 has better rating than Fritz on the >same hardware and that tiger has clearly better performance than Fritz. From what we have seen so far, all programs in the top 8 or so are peers in ability. In other words, within one single standard deviation of uncertainty there is nothing to tell which is the stronger. The mean value may be slightly higher for some programs, but unless you play a bazillion games, there really is not enough to separate them with mathematical certainty. Within one standard deviation, you are really saying: This program has an ELO strength *relative to this pool of peers* which has a 67% chance of being between the "+" mark and the "-" mark. It takes *two* standard deviations to be 97% sure. IOW, the program that tops the SSDF is pretty much a crap-shoot. On the other hand, a higher x-bar is a real indication of strength. It just is not as certain as most people seem to think it is.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.