Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: EVIDENCE That Junior REALLY DID Perform Very Badly In Bilbao

Author: Graham Laight

Date: 01:33:31 10/15/04

Go up one level in this thread


On October 14, 2004 at 16:13:48, Sandro Necchi wrote:

>On October 13, 2004 at 15:51:02, Robert Hyatt wrote:
>
>>On October 13, 2004 at 11:11:47, Graham Laight wrote:
>>
>>>On October 13, 2004 at 10:55:20, Michael Yee wrote:
>>>
>>>>On October 13, 2004 at 10:42:08, Graham Laight wrote:
>>>>>On October 13, 2004 at 10:33:30, Michael Yee wrote:
>>>>
>>>>>>have 1 "bad" (or underperforming) tournament out of 20, i.e., with low
>>>>>>probability. But the rare event *will* (or could) happen at some point.
>>>>>
>>>>>Please see the answer I gave in
>>>>>http://www.talkchess.com/forums/1/message.html?391399
>>>>>
>>>>>-g
>>>>>
>>>>>>Michael
>>>>
>>>>No offense, but I don't think I understand what your point is. Your simulation
>>>
>>>My points (made throughout the thread - not just in the previous post in this
>>>branch of the thread) are:
>>>
>>>1. Given the Hydra and Fritz results, the Junior result is unexpectedly low
>>
>>
>>What would you do if you took four humans, and four copies of fritz or hydra and
>>played the _same_ event again?  And what would you say if one of the copies of
>>Fritz produced 3 draws and a loss?  "It did poorly?"  Or "unexpected random
>>chance?"
>>
>>It is almost a certainty that all 4 copies would _not_ produce the same
>>result...
>>
>Bob,
>
>you are correct but we are "old fashion". I hope you do not get upset; I mean we
>deeply analyze things and try to give explanations to things.
>
>The "modern" way is to give very quick estimantions/evaluations based on scores
>on limited amount of games and or events.
>
>This is why one program can go down from top to lowest level and the other way
>around so easily. This happens on sports too.
>
>Of course not everybody think this way, but more and more people seems to do it
>probably because to understand things need a lot of specific knowledge and
>experience which require a lot of time and people do not have or are not willing
>to invest the needed time. So it is easier to make fast comments; not so easy to
>make deep analysis...

In this case, you're not correct, I'm afraid. Unless you do the maths, you could
easily fail to realise that a small sample is actually giving you more
information than a large sample. For an example of this, please see
http://www.talkchess.com/forums/1/message.html?391534

-g

>Sandro
>>
>>>
>>>2. The Hydra and Fritz results taken together are an indication of great
>>>strength
>>>
>>>>(or even just a basic probability calculation) shows that a "low" score for an
>>>>engine that is assumed to have a certain strength is a rare event. I don't
>>>>disagree with that. I'm just confused about what conclusions you're trying to
>>>>draw from witnessing a rare event.
>>>>
>>>>Here's how I might put bilbao in perspective: Suppose we are looking at this
>>>>tournament as simply one in a stream of tournaments, and we consider updating
>>>>junior's rating (i.e., strength estimate) in a bayesian way. Then junior's past
>>>>results would weigh much more heavily than this one new result and the rating
>>>>wouldn't change by much.
>>>>
>>>>What would I conclude? Probably that junior had a (slightly) rare result.
>>>
>>>The Junior result is probably not too far away from what you'd expect. Perhaps I
>>>have been looking in astonishment at the wrong place. Perhaps the astonishment
>>>should be focused upon the 7/8 score which Hydra and Fritz achieved - which is
>>>highly improbable (I calculated 1/160 in another post in this thread) unless
>>>these two computers are substantially better than the opponents that they faced
>>>at Bilbao.
>>>
>>>-g
>>>
>>>>Michael



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.