Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Nope -- You have to play..... NO shortcuts. -- Wait, it depends!

Author: Eric Campos

Date: 19:27:37 01/28/00

Go up one level in this thread


On January 28, 2000 at 15:01:01, Dann Corbit wrote:

>On January 28, 2000 at 14:40:35, Christophe Theron wrote:
>[snip]
>>After  10 games, if no program wins by 64.0% or more => play on
>>After  20 games, if no program wins by 61.0% or more => play on
>>After  40 games, if no program wins by 58.0% or more => play on
>>After 100 games, if no program wins by 55.0% or more => play on
>>After 200 games, if no program wins by 53.5% or more => play on
>>
>>And so on.
>>
>>If you play two identical programs, you are likely to play on forever. That
>>sounds strange, but it's only logical.
>>
>>And to answer your question, I thought that playing 40 games between Tiger and
>>Diep and 40 games between Tiger and Crafty would be enough, because I think the
>>difference between Tiger and these programs is above 56 elo points.
>Unfortunately, this method does not work.  You always have to play the thousand
>games to know the answer that you would have to play if they were separated by
>one ELO.
>Why?  Because you don't really know if they are separated by 10 ELO or 1000 ELO.

Dann, I agree that you need the thousand(s) of games to tell how much separation
there is between the programs, but I believe that Christophe is only
trying to tell which program is better (at least when playing each other).  This
is very different from trying to measure relative strengths -- in which case I
would agree that there are probably no shortcuts.

In other/better words, trying to predict which program will win most of the time
is alot different from trying to predict how often one program will beat
another.  From what I've read, I think Christophe is trying to do the former,
and your math is explaining the latter.

His numbers seem plausible to me (but, I admit, I would have to revisit my
statistics books before trying to pin down precise numbers).

In any case, I don't think your following coin toss experiment applies here.
You mentioned that "probably several would get 90% or better results".  I agree
-- we can expect (1+10+10+1)/1024, or about 2% to achieve these results (hey,
please check my math here, I could be off by a factor of two or so!).  But
remember, Christophe gave an 80% confidence level, which implies that we expect
to be wrong about 20% of the time.  So, having several readers get 90% or better
results isn't at all conflicting with Christophe's calculations.

If Christophe had quoted a 99% confidence level, your penny flip experiment
(below)would have been a great counter example.


>
>Take a penny and flip it ten times.  If every reader of CCC did this, probably
>several would get 90% or better results from a fair coin.  So -- here we did our
>ten trials and got a 90% result.  So we have conclusively PROVEN that heads (or
>tails) is stronger.  ***NOT***
>
>What's wrong?  It is just that a fair contest between equals is
>INDISTINGUISHABLE from a completely lopsided affair.
>
>It always, and I do mean ALWAYS requires the full set of trials to know
>something about the relative strengths.  A blasting in a short series tells us
>almost NOTHING.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.