Author: Dann Corbit
Date: 17:52:05 01/13/00
Go up one level in this thread
On January 13, 2000 at 20:23:27, Luis E. Alvarado wrote: [snip] >You have a point but, that is why we are forze to rely in the SSDF ratings. If >FRITZ is rated Higher than Rebel, Then it is stronger. The SSDF ratings also have a standard deviation figure. If that number is taken into account, even within one standard deviation, it is not certain which is stronger -- Fritz or Rebel. Nor does it matter much. 99.99% of the people who buy it will not be able to beat either 99.99% of the time. But it does make for good one-upsmanship. "My program can knock the stuffings out of your program." OTOH, being ranked in the top ten of that list is a sure indicator of very high strength. And the higher up you are, the stronger you probably are. But you are not _provably_ stronger than the other programs within one standard deviation (or the certainty that you are better is very low would be a better way to describe it). Consider this (current) list: http://home3.swipnet.se/~w-36794/ssdf/nr000.htm Here are the top programs (those which have been benched on the 450 MHz machines): Rating + - Games Won Average opposition 1 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2696 44 -40 317 72% 2533 2 Fritz 5.32 128MB K6-2 450 MHz 2671 45 -41 297 72% 2506 3 Nimzo 7.32 128MB K6-2 450 MHz 2663 37 -35 409 69% 2526 4 Nimzo 99 128MB K6-2 450 MHz 2644 52 -48 214 67% 2520 5 Hiarcs 7.32 128MB K6-2 450 MHz 2636 42 -39 320 67% 2509 6 Junior 5.0 128MB K6-2 450 MHz 2619 54 -50 190 65% 2508 The relative ELO of Chess Tiger is 2696 +44/-40 ELO points (to within one standard deviation). That means that in this pool of programs, the ELO of CT is between 2740 and 2656 with a probability of about 2/3 of being correct. If we double the standard deviation, the probability will increase to over 9/10. Under that idea, the ELO of CT could possibly be as high as 2784 or as low as 2616 if we want to be fairly certain that we have the true mark. As more games are played, the band will get more narrow. If we played an infinite number of games, the width would be zero and we would know exactly the true ELO. Now, the lowest one on this list of those tested at 450 MHz is Junior. With an ELO of 2619, adding two standard deviations would give us an ELO between 2727 and 2519. So... CT true ELO probably between [2784 and 2616] JR true ELO probably between [2717 and 2519] Notice that 2717 is one hundred points higher than 2616. So Junior could (theoretically) be the stronger program. It may be more likely that it is the other way around, but we *really* don't know for certain. It could also be as weak as 2519 in that pool. The true figure is *probably* closer to the stated average but (again) we just can't tell from the data. The SSDF is probably the strongest indicator for comp/comp performance at the exact stated conditions of the experiment. However, as you can easily see, it does not show what most people think it does. OTOH, I am very glad that Chess Tiger leads the list because I think Christophe Theron is a nice guy. See -- I get emotionally attached too. So much for scientific objectivity.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.