Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Why Did Junior Underperform So Badly In Bilbao?

Author: Robert Hyatt
Date: 12:57:02 10/13/04
On October 13, 2004 at 09:36:13, Graham Laight wrote:

>On October 13, 2004 at 09:09:05, Peter Skinner wrote:
>
>>On October 13, 2004 at 07:51:42, Graham Laight wrote:
>>
>>>I refer you to http://www.talkchess.com/forums/1/message.html?391364 , and I
>>>would be interested to read your comments!
>>
>>One tournament would hardly be a basis to determine the strength of a human or a
>>computer.
>
>This is a good point. However - the results I'm getting back from the simulator
>(linked above) are seriously at odds with some assumptions that some members in
>this thread seem to hold:
>
>1. That the computers at Bilbao had a roughly equal chance of winning. If you
>create a high probability of winning (in order to justify Hydra and Fritz's
>results), then you end up with a startlingly low probability of Junior getting
>the low score that it did - EVEN WITH ONLY 4 GAMES.

irrelevant.  What about the probability of four coin tosses producing 4 heads?
P=1/16, but it happened when I did the test four times only...

You seem to believe that if something has a probability of .5, it is going to
happen 1/2 of the time.  Over the "long term" that is true.  But a single sample
in the middle of a long sequence can produce most any result imaginable...

This was one event, with one of three programs having a worse result that
expected (or just perhaps the others had a better result than they should have?)
Statistics applies to the "long-term".


>
>2. Joachim used a 50% probability of winning in his post to get acceptable
>probabilities for the 3 different outcomes (3.5/4 x 2 and 1.5/4). However - this
>is at odds with what Dr Hyatt wrote - which is that when contemplating computer
>chess strength, I should "think lower"
>(http://www.talkchess.com/forums/1/message.html?391290)
>
>IMO, in terms of what members have been writing in this thread, these are VERY
>SIGNIFICANT points.
>
>-g


The major flaw here is you are taking three programs, each playing four games,
and then you are deciding which single program produced the "unexpected result."
 As I said, perhaps the other two played way better than expected.  With so few
games that is just as likely as the current hypothesis...



>
>>In such case the biggest factor is luck, or lack there of. That is why rating
>>lists are based on large number of games, vs a pool of players.
>>
>>In the human pools there are so many factors to consider. Fatigue, stress,
>>dehydration are some of the factors.
>>
>>Take tennis for instance. The #1 women's player in the world just announced that
>>she is cutting her season short due to fatigue. The same can happen in just one
>>tournament. A player could be fatigued due to schedule, flight arrangements, or
>>the hooker he purchased the night before keeping him up all night. :)
>>
>>There are just to many unknown factors that one simply can not base a strength
>>assessment on just one tourament.
>>
>>I use a "season" of results to determine what programs I will be purchasing,
>>namely the IPCCC, WCCC, ICCT, and the SSDF list results to determine strength.
>>If one program were to win 3/4 events it is likely that such program is the
>>strongest. Especially when looking at the results it beat the #2 and #3
>>competitors on a regular basis.
>>
>>Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.