Author: Christophe Theron
Date: 21:11:38 01/29/00
Go up one level in this thread
On January 29, 2000 at 14:11:01, Enrique Irazoqui wrote:
>On January 29, 2000 at 13:00:43, Christophe Theron wrote:
>
>>On January 29, 2000 at 08:54:47, Enrique Irazoqui wrote:
>>
>>>On January 29, 2000 at 00:28:59, Christophe Theron wrote:
>>>
>>>From my own 1999 tournament, still with short and few matches:
>>>
>>>Junior 5-Genius 6 3.5-6.5
>>>Junior 5-Tiger 11.75 6.5-3.5
>>>Tiger 11.75-Genius 6 6-4
>>>
>>>Junior 5 = 10
>>>Genius 6 = 10.5
>>>Tiger 11.75 = 9.5
>>>
>>>I don't know precisely how often this happens, but I have seen non-transitivity
>>>a number of times, also in much longer matches.
>>
>>
>>So show me evidence with longer matches. Because the example you give has little
>>statistical relevance.
>
>I would have to dig deep to find examples. I prefer to pass you the ball: show
>me the irrefutable evidence of transitivity.
I told you that I think that transitivity does not apply. Actually I think it
applies in many cases, but there are numerous exceptions (and these exceptions
always reveal something interesting about the programs).
When you think about it, it is very simple to write an opening book for A that
kills opening book of B, but not opening book of C. Then you write an opening
book for B that kills C and you get non-transitivity.
It is probably not very dificult either to write three engines in such a way
that A kills B but not C, and B kills C.
So about transitivity I would not disagree much with you, except maybe on how
often it happens, but that is no big deal.
The problem of non-transitivity almost disappears, I think, by playing various
opponents. By rating programs by letting them play in a pool of various
opponents, the effect becomes negligible I think.
>>>In this example there are 2 cases of > 80% probability that ends up being wrong.
>>>With this I mean to say that I don't trust 80% probabilities for a penny.
>>
>>
>>I have never advocated for a 80% confidence.
>>
>>It was just an example. The table I have given is OK for 80% confidence.
>>
>>You can build such a table for 95% confidence, or for 99.9% confidence.
>>
>>Still, my point is valid. With a small number of games you cannot tell which
>>program is stronger, unless there is a big difference between them.
>
>Here I agree.
Great!
>>For 95% confidence, this is even worse. You can deduce almost nothing from a 10
>>games match if you want to be 95% sure.
>>
>>
>>
>>
>>>I still don't think that a match can determine which one of 2 programs is the
>>>strongest, unless, of course, the end result is a real smash of the order of 90%
>>>or so after a long series of games.
>>
>>
>>I'm surprised that you think so.
>>
>>A match that is long enough can tell which program is the best. The length of
>>the match that is needed is a function of the elo difference between the
>>programs.
>>
>>With a big difference, a relatively small number of games will be enough.
>>
>>If the difference is tiny, a very high number of games is needed.
>
>Nothing of this applies if there is no transitivity. Program A can beat 600-400
>program B without being necessarily better. We have to solve first the
>transitivity issue.
OK, I see your point.
Still, this is not a problem. When you rate a program, you average the various
performances it has done against various opponents in its pool. This in my
opinion helps to get rid of the non-transitivity effect.
So there is no contradiction in my opinion to say that prog A is 100 elo above
prog B, prog B is 100 elo above prog C, but prog A is only 100 elo above prog C.
Because these are not the same "elos". The elo difference of a program against
another one is not the same than the elo difference of this program against the
pool.
So when you talk about relative elos of programs when they play each other, you
can safely ignore the transitivity issue (there can be transitivity only when
there are 3 programs involved). In this case, my tables (or other tables
computed with higher confidence percentages) simply apply.
These tables are useful to say if the match between two programs has been
meaningful of not.
But I agree that you cannot derive an absolute elo rating difference from my
table. So you cannot ignore the performances that the program will get against
the other programs.
>>>It would take too long to prove it, and you can always argue that transitivity
>>>has never been proven and we still love to believe in these things. But as far
>>>as believes go, I do believe that there is such a thing as non-transitivity. I
>>>have seen it, I suspect it exists...
>>
>>
>>I thought you were not somebody who likes to live without knowing.
>
>Precisely. That's why I question your point.
OK, I hope that we have now less points of disagreement?
>>I.e. you have established a test suite that predicts very accurately the SSDF
>>results.
>
>Not any more, I think. Junior 6 and Shredder 4 seem much stronger than my test
>seems to indicate. You see, it is worth it to keep questioning statistics,
>including one's own.
Of course! I question my own methods everyday!
>>By doing so you implicitely admit that you believe in statistics.
>
>It is a tool, of course, and I never denied it. I am only keeping a critical eye
>as much as I can manage. It is healthy, no?
Yes it is.
>>If you don't believe in statistics, you should stop making any reference to elo
>>ratings.
>
>Come on, come on. Where did I say I don't believe in statistics? I have
>questions about yours. Namely 2: the effect of non-transitivity and Matthias'
>point about a game of chess being not necessarily one and only one probabilistic
>event. Christophe: I don't have your statistics clear, that's all.
I hope it is more clear now (actually it is more clear for me now! :)
>>But if you buy the "statistics package", you buy several things with it. For
>>example the fact that a match can be relevant or not, that with a match that is
>>long enough you don't need a 90% winning percentage to be sure which program is
>>best, and so on...
>
>See above. This is not my point.
>
>>When you start to study chess matches from a statistical point of view, you
>>discover some numbers that are surprising for common sense. It takes time to get
>>used to it, but if you want to understand things better, and I'm sure you want,
>>these statistical results are a great help.
>
>Statistics can be of great help and they can be misleading, sure.
>
>>I did not want to hurt your feelings,
>
>??? Why would I think you did? It didn't even pop in my mind.
>
>No, no. I take all this as I think I should: as a discussion about a specific
>issue.
OK. I appreciate to talk about this. This discussion made me think deeper about
this transitivity issue, which is an important point I think.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.