Author: blass uri
Date: 15:28:36 07/15/00
Go up one level in this thread
On July 15, 2000 at 17:56:09, Robert Hyatt wrote: >On July 15, 2000 at 17:20:18, blass uri wrote: > >>On July 15, 2000 at 16:59:32, Mogens Larsen wrote: >> >>>On July 15, 2000 at 16:45:19, ShaktiFire wrote: >>> >>>>Chris Carson has documented dozens of games at standard time control >>>>of computer play vs. GMs. >>>> >>>>I won't knit pick...this or that program, this or that hardware. >>>> >>>>But in the last 2 years, dozens of games have been played. Computers >>>>vs. GMs at standard time control. >>>> >>>>Ratings can be calculated with these games. The more games played, >>>>the less uncertainty in the rating. The rating indicated, based >>>>on these dozens of games is over 2500. >>> >>>You can't include games from all types of programs on all types of hardware >>>under different game conditions (tournament, exhibition or something else) and >>>reach a sound conclusion. Given the number of programs and hardware >>>configurations, you can't say that computer programs as a single entity are of >>>GM strength. You need an identical setup, software and hardware, and then >>>conduct enough games to reduce the uncertainty sufficiently to ensure a >>>confident rating above 2500. The scientific method is testing using a stable and >>>unchanged setup. >> >>If you have many programs that have performance of more than 2500 you can be >>sure that the best of them has more than 2500 rating. > >That is simply unsound logic. I do ten trials of flipping a coin 10 times. In >7 of the trials I get more heads than tails. Does this mean that _one_ of those >seven trials is certainly the _truth_? If you decided about a set of programs before and you see that all of them together have a rating more than 2500(if you assume they are the same program) than if there are enough games you can be practically sure that at least one of them has rating of more than 2500. > >What if person A picks the results from ten such trials, and in nearly every >trial he picked, more heads than tails came up. What if others have run other >trials but didn't publicize their results. And their trials came up mostly >tails? I agree that if the choice of the program is selective this prove nothing but I assumed that the >2500 performance was based on tournament time control and that all the public games of commercial programs or some other good programs were counted. I agree that there should be a clear definition which programs are good before every event in order to decide if to count them because I do not want to count programs like TSCP even if they play in human events. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.