Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Enrique Irazoqui

Date: 11:11:01 01/29/00

Go up one level in this thread


On January 29, 2000 at 13:00:43, Christophe Theron wrote:

>On January 29, 2000 at 08:54:47, Enrique Irazoqui wrote:
>
>>On January 29, 2000 at 00:28:59, Christophe Theron wrote:
>>
>>From my own 1999 tournament, still with short and few matches:
>>
>>Junior 5-Genius 6        3.5-6.5
>>Junior 5-Tiger 11.75     6.5-3.5
>>Tiger 11.75-Genius 6       6-4
>>
>>Junior 5    = 10
>>Genius 6    = 10.5
>>Tiger 11.75 = 9.5
>>
>>I don't know precisely how often this happens, but I have seen non-transitivity
>>a number of times, also in much longer matches.
>
>
>So show me evidence with longer matches. Because the example you give has little
>statistical relevance.

I would have to dig deep to find examples. I prefer to pass you the ball: show
me the irrefutable evidence of transitivity.

>>In this example there are 2 cases of > 80% probability that ends up being wrong.
>>With this I mean to say that I don't trust 80% probabilities for a penny.
>
>
>I have never advocated for a 80% confidence.
>
>It was just an example. The table I have given is OK for 80% confidence.
>
>You can build such a table for 95% confidence, or for 99.9% confidence.
>
>Still, my point is valid. With a small number of games you cannot tell which
>program is stronger, unless there is a big difference between them.

Here I agree.

>For 95% confidence, this is even worse. You can deduce almost nothing from a 10
>games match if you want to be 95% sure.
>
>
>
>
>>I still don't think that a match can determine which one of 2 programs is the
>>strongest, unless, of course, the end result is a real smash of the order of 90%
>>or so after a long series of games.
>
>
>I'm surprised that you think so.
>
>A match that is long enough can tell which program is the best. The length of
>the match that is needed is a function of the elo difference between the
>programs.
>
>With a big difference, a relatively small number of games will be enough.
>
>If the difference is tiny, a very high number of games is needed.

Nothing of this applies if there is no transitivity. Program A can beat 600-400
program B without being necessarily better. We have to solve first the
transitivity issue.

>>It would take too long to prove it, and you can always argue that transitivity
>>has never been proven and we still love to believe in these things. But as far
>>as believes go, I do believe that there is such a thing as non-transitivity. I
>>have seen it, I suspect it exists...
>
>
>I thought you were not somebody who likes to live without knowing.

Precisely. That's why I question your point.

>I.e. you have established a test suite that predicts very accurately the SSDF
>results.

Not any more, I think. Junior 6 and Shredder 4 seem much stronger than my test
seems to indicate. You see, it is worth it to keep questioning statistics,
including one's own.

>By doing so you implicitely admit that you believe in statistics.

It is a tool, of course, and I never denied it. I am only keeping a critical eye
as much as I can manage. It is healthy, no?

>If you don't believe in statistics, you should stop making any reference to elo
>ratings.

Come on, come on. Where did I say I don't believe in statistics? I have
questions about yours. Namely 2: the effect of non-transitivity and Matthias'
point about a game of chess being not necessarily one and only one probabilistic
event. Christophe: I don't have your statistics clear, that's all.

>But if you buy the "statistics package", you buy several things with it. For
>example the fact that a match can be relevant or not, that with a match that is
>long enough you don't need a 90% winning percentage to be sure which program is
>best, and so on...

See above. This is not my point.

>When you start to study chess matches from a statistical point of view, you
>discover some numbers that are surprising for common sense. It takes time to get
>used to it, but if you want to understand things better, and I'm sure you want,
>these statistical results are a great help.

Statistics can be of great help and they can be misleading, sure.

>I did not want to hurt your feelings,

??? Why would I think you did? It didn't even pop in my mind.

No, no. I take all this as I think I should: as a discussion about a specific
issue.

> but sometimes a little bit of mathematical
>help is welcome.

And keeping a critical eye can do wonders. :)

Enrique

> You know how feelings can be misleading.
>
>Further, I think our feelings can be better "tuned" if we train them on real
>good examples (I mean significant experiments). Then they will help us better in
>unclear experiments.
>
>I mean if you don't know the rules of chess and you are watching a game, you can
>have the feeling that one player is winning when he is not (because maybe you
>are only watching the players attitudes). When you are an experimented player,
>you will not be fooled by the players attitudes for example.
>
>
>
>    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.