Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: EVIDENCE That Junior REALLY DID Perform Very Badly In Bilbao

Author: Sandro Necchi

Date: 03:51:08 10/15/04

Go up one level in this thread


On October 15, 2004 at 04:33:31, Graham Laight wrote:

>On October 14, 2004 at 16:13:48, Sandro Necchi wrote:
>
>>On October 13, 2004 at 15:51:02, Robert Hyatt wrote:
>>
>>>On October 13, 2004 at 11:11:47, Graham Laight wrote:
>>>
>>>>On October 13, 2004 at 10:55:20, Michael Yee wrote:
>>>>
>>>>>On October 13, 2004 at 10:42:08, Graham Laight wrote:
>>>>>>On October 13, 2004 at 10:33:30, Michael Yee wrote:
>>>>>
>>>>>>>have 1 "bad" (or underperforming) tournament out of 20, i.e., with low
>>>>>>>probability. But the rare event *will* (or could) happen at some point.
>>>>>>
>>>>>>Please see the answer I gave in
>>>>>>http://www.talkchess.com/forums/1/message.html?391399
>>>>>>
>>>>>>-g
>>>>>>
>>>>>>>Michael
>>>>>
>>>>>No offense, but I don't think I understand what your point is. Your simulation
>>>>
>>>>My points (made throughout the thread - not just in the previous post in this
>>>>branch of the thread) are:
>>>>
>>>>1. Given the Hydra and Fritz results, the Junior result is unexpectedly low
>>>
>>>
>>>What would you do if you took four humans, and four copies of fritz or hydra and
>>>played the _same_ event again?  And what would you say if one of the copies of
>>>Fritz produced 3 draws and a loss?  "It did poorly?"  Or "unexpected random
>>>chance?"
>>>
>>>It is almost a certainty that all 4 copies would _not_ produce the same
>>>result...
>>>
>>Bob,
>>
>>you are correct but we are "old fashion". I hope you do not get upset; I mean we
>>deeply analyze things and try to give explanations to things.
>>
>>The "modern" way is to give very quick estimantions/evaluations based on scores
>>on limited amount of games and or events.
>>
>>This is why one program can go down from top to lowest level and the other way
>>around so easily. This happens on sports too.
>>
>>Of course not everybody think this way, but more and more people seems to do it
>>probably because to understand things need a lot of specific knowledge and
>>experience which require a lot of time and people do not have or are not willing
>>to invest the needed time. So it is easier to make fast comments; not so easy to
>>make deep analysis...
>
>In this case, you're not correct, I'm afraid. Unless you do the maths, you could
>easily fail to realise that a small sample is actually giving you more
>information than a large sample. For an example of this, please see
>http://www.talkchess.com/forums/1/message.html?391534
>
>-g

If you only use maths you will not go too far estimating chess games and
matches.

I believe a large no. of games is more reliable, but the games should be checked
and understood too as well as what is behind the games talking about human
players too.
I mean that in this case you need to consider also that the GMs that faced the
computers did not have enough experience playing them and this had an influence
on the final score.

About studying the games it is clear that if a bad variation basing the score
too much this needs to be taken into consideration.

I mean we need to use a smart way to read data and not only maths...

A strong GM normally select 3 best moves as an everage in any position based on
knowledge and experience and does not allocate a match percentage on each move
based on maths or percentages only.

Sandro
>
>>Sandro
>>>
>>>>
>>>>2. The Hydra and Fritz results taken together are an indication of great
>>>>strength
>>>>
>>>>>(or even just a basic probability calculation) shows that a "low" score for an
>>>>>engine that is assumed to have a certain strength is a rare event. I don't
>>>>>disagree with that. I'm just confused about what conclusions you're trying to
>>>>>draw from witnessing a rare event.
>>>>>
>>>>>Here's how I might put bilbao in perspective: Suppose we are looking at this
>>>>>tournament as simply one in a stream of tournaments, and we consider updating
>>>>>junior's rating (i.e., strength estimate) in a bayesian way. Then junior's past
>>>>>results would weigh much more heavily than this one new result and the rating
>>>>>wouldn't change by much.
>>>>>
>>>>>What would I conclude? Probably that junior had a (slightly) rare result.
>>>>
>>>>The Junior result is probably not too far away from what you'd expect. Perhaps I
>>>>have been looking in astonishment at the wrong place. Perhaps the astonishment
>>>>should be focused upon the 7/8 score which Hydra and Fritz achieved - which is
>>>>highly improbable (I calculated 1/160 in another post in this thread) unless
>>>>these two computers are substantially better than the opponents that they faced
>>>>at Bilbao.
>>>>
>>>>-g
>>>>
>>>>>Michael



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.