Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 10 games is not enough! Remember Fritz Junior match? Of course

Author: Christophe Theron

Date: 13:43:04 07/05/01

Go up one level in this thread


On July 05, 2001 at 07:45:09, Harald Faber wrote:

>On July 05, 2001 at 06:10:27, Torstein Hall wrote:
>
>>On July 05, 2001 at 04:20:40, Harald Faber wrote:
>>
>>>A disappointing result for the GambitTiger fans. Such a clear and justified loss
>>>hasn't happened before. I wouldn't explain the 2.5-7.5 with statistical margain.
>>>Looking at the games, I'd expect another win of Shredder if the match would be
>>>repeated. Maybe not so high, but a 6-4 would fit my  forecast. Shredder really
>>>played some fine games, see yourself. You find the games at
>>>http://www.geocities.com/Harald1312/HaraldFaberE.html.
>>>
>>>Next match is versus Hiarcs 7.32 who was already very hard to beat by
>>>ChessTiger. The first game ended in a draw, in the second one both programs show
>>>a +2 in favour of Hiarcs, but it is a rook and pawn ending so maybe GambitTiger
>>>can reach a draw.
>>
>>10 games is not enough games to say anything for sure about program strenght!
>
>I know. But take a look at the games, don't you agree that Shredder played
>convincingly?


Take only the won games of a program and you are always going to think it won
convincingly.

It does not apply only to Tiger or Shredder, it's a general rule.



> And of course I wouldn't dare speaking of Shredder being stronger
>than GambitTiger after only 10 games. I would at least take 50 games to be on
>the right trip.
>
>>I'm not sure how many games you would need to 99% sure that one program is
>>stronger than another. Perhaps 2-300?
>>
>>Torstein
>
>Don't know exactly, but 50-100 should be enough to get a good approximation. I'd
>be interested in a statistic that shows the average gain/loss of ELO in the SSDF
>between 100 and 300 or 500 games.


I use the following tables. These tables are not very accurate because they
assume 1/3 chances for win/draw/losses, but they give you a good approximation.
At least it is better than only guessing.



Reliability of chess matches
(assuming each opponent has 1/3 chances to win, 1/3 to loose and 1/3 to draw)

90% confidence
Games	%err+/-	elo+/-
    10	 20	140pts
    20	 15	105pts
    25	 14	 98pts
    30	 12	 63pts
    40	 10	 70pts
    50	  9	 56pts
   100	  6.5	 35pts
   200	  4.72	 33pts
   400	  3.34   23pts
   600	  2.66	 19pts
   800	  2.39	 17pts
  1000	  2.12	 15pts
  1200	  2.00	 14pts
  1400	  1.81	 13pts
  1600	  1.66	 12pts

80% confidence
Games	%err+/-	elo+/-
    10	 15	105pts
    20	 11	 77pts
    25	 10	 70pts
    30	  9	 63pts
    40	  8	 56pts
    50	  7	 49pts
   100	  5.0	 35pts
   200	  3.75	 26pts
   400	  2.60	 18pts
   600	  2.15	 15pts
   800	  1.86	 13pts
  1000	  1.66	 12pts
  1200	  1.46	 10pts
  1400	  1.40	 10pts
  1600	  1.34	  9pts

70% confidence
Games	%err+/-	elo+/-
    10	 15	105pts
    20	 10	 70pts
    25	  8	 56pts
    30	  8	 56pts
    40	  6.3	 44pts
    50	  6.0	 42pts
   100	  4.0	 28pts
   200	  3.0	 21pts
   400	  2.2	 15pts
   600	  1.7	 12pts
   800	  1.5	 11pts
  1000	  1.3	  9pts
  1200	  1.24	  9pts
  1400	  1.14	  8pts
  1600	  1.04	  7pts



So a result of 7.5-2.5 is 75%, and with 90% confidence you can say that it is
significant (50%+20% which is the margin of error for 90% confidence on 10 games
match gives 70%, and as the actual result is above that the match has
statistical significance).

However this is only 90% confidence, so by saying that the result clearly shows
that one program is better than the other one, you'll end up being wrong in 10%
of the cases.

If you assume that Tiger and Shredder are 70 elo points apart or less, then you
need to play AT LEAST 40 games to show it with 90% confidence.

If you assume they are 35 elo points apart or less, you'll need AT LEAST 100
games (if you want 90% confidence).

To differentiate a 23 elo points interval, be ready to play 400 games.

And so on... just pick the line you wish in the tables.



    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.