Author: Robert Hyatt
Date: 12:57:02 10/13/04
Go up one level in this thread
On October 13, 2004 at 09:36:13, Graham Laight wrote: >On October 13, 2004 at 09:09:05, Peter Skinner wrote: > >>On October 13, 2004 at 07:51:42, Graham Laight wrote: >> >>>I refer you to http://www.talkchess.com/forums/1/message.html?391364 , and I >>>would be interested to read your comments! >> >>One tournament would hardly be a basis to determine the strength of a human or a >>computer. > >This is a good point. However - the results I'm getting back from the simulator >(linked above) are seriously at odds with some assumptions that some members in >this thread seem to hold: > >1. That the computers at Bilbao had a roughly equal chance of winning. If you >create a high probability of winning (in order to justify Hydra and Fritz's >results), then you end up with a startlingly low probability of Junior getting >the low score that it did - EVEN WITH ONLY 4 GAMES. irrelevant. What about the probability of four coin tosses producing 4 heads? P=1/16, but it happened when I did the test four times only... You seem to believe that if something has a probability of .5, it is going to happen 1/2 of the time. Over the "long term" that is true. But a single sample in the middle of a long sequence can produce most any result imaginable... This was one event, with one of three programs having a worse result that expected (or just perhaps the others had a better result than they should have?) Statistics applies to the "long-term". > >2. Joachim used a 50% probability of winning in his post to get acceptable >probabilities for the 3 different outcomes (3.5/4 x 2 and 1.5/4). However - this >is at odds with what Dr Hyatt wrote - which is that when contemplating computer >chess strength, I should "think lower" >(http://www.talkchess.com/forums/1/message.html?391290) > >IMO, in terms of what members have been writing in this thread, these are VERY >SIGNIFICANT points. > >-g The major flaw here is you are taking three programs, each playing four games, and then you are deciding which single program produced the "unexpected result." As I said, perhaps the other two played way better than expected. With so few games that is just as likely as the current hypothesis... > >>In such case the biggest factor is luck, or lack there of. That is why rating >>lists are based on large number of games, vs a pool of players. >> >>In the human pools there are so many factors to consider. Fatigue, stress, >>dehydration are some of the factors. >> >>Take tennis for instance. The #1 women's player in the world just announced that >>she is cutting her season short due to fatigue. The same can happen in just one >>tournament. A player could be fatigued due to schedule, flight arrangements, or >>the hooker he purchased the night before keeping him up all night. :) >> >>There are just to many unknown factors that one simply can not base a strength >>assessment on just one tourament. >> >>I use a "season" of results to determine what programs I will be purchasing, >>namely the IPCCC, WCCC, ICCT, and the SSDF list results to determine strength. >>If one program were to win 3/4 events it is likely that such program is the >>strongest. Especially when looking at the results it beat the #2 and #3 >>competitors on a regular basis. >> >>Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.