Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 10 games is not enough! Remember Fritz Junior match? Of course

Author: Harald Faber

Date: 22:06:54 07/05/01

Go up one level in this thread


On July 05, 2001 at 16:43:04, Christophe Theron wrote:

>On July 05, 2001 at 07:45:09, Harald Faber wrote:
>
>>On July 05, 2001 at 06:10:27, Torstein Hall wrote:
>>
>>>On July 05, 2001 at 04:20:40, Harald Faber wrote:
>>>
>>>>A disappointing result for the GambitTiger fans. Such a clear and justified loss
>>>>hasn't happened before. I wouldn't explain the 2.5-7.5 with statistical margain.
>>>>Looking at the games, I'd expect another win of Shredder if the match would be
>>>>repeated. Maybe not so high, but a 6-4 would fit my  forecast. Shredder really
>>>>played some fine games, see yourself. You find the games at
>>>>http://www.geocities.com/Harald1312/HaraldFaberE.html.
>>>>
>>>>Next match is versus Hiarcs 7.32 who was already very hard to beat by
>>>>ChessTiger. The first game ended in a draw, in the second one both programs show
>>>>a +2 in favour of Hiarcs, but it is a rook and pawn ending so maybe GambitTiger
>>>>can reach a draw.
>>>
>>>10 games is not enough games to say anything for sure about program strenght!
>>
>>I know. But take a look at the games, don't you agree that Shredder played
>>convincingly?
>
>
>Take only the won games of a program and you are always going to think it won
>convincingly.


There haven't been many won games by Tiger... :-)


>> And of course I wouldn't dare speaking of Shredder being stronger
>>than GambitTiger after only 10 games. I would at least take 50 games to be on
>>the right trip.
>>
>>>I'm not sure how many games you would need to 99% sure that one program is
>>>stronger than another. Perhaps 2-300?
>>>
>>>Torstein
>>
>>Don't know exactly, but 50-100 should be enough to get a good approximation. I'd
>>be interested in a statistic that shows the average gain/loss of ELO in the SSDF
>>between 100 and 300 or 500 games.
>
>
>I use the following tables. These tables are not very accurate because they
>assume 1/3 chances for win/draw/losses, but they give you a good approximation.
>At least it is better than only guessing.
>
>
>
>Reliability of chess matches
>(assuming each opponent has 1/3 chances to win, 1/3 to loose and 1/3 to draw)
>
>90% confidence
>Games	%err+/-	elo+/-
>    10	 20	140pts
>    20	 15	105pts
>    25	 14	 98pts
>    30	 12	 63pts
>    40	 10	 70pts
>    50	  9	 56pts
>   100	  6.5	 35pts
>   200	  4.72	 33pts
>   400	  3.34   23pts
>   600	  2.66	 19pts
>   800	  2.39	 17pts
>  1000	  2.12	 15pts
>
>
>So a result of 7.5-2.5 is 75%, and with 90% confidence you can say that it is
>significant (50%+20% which is the margin of error for 90% confidence on 10 games
>match gives 70%, and as the actual result is above that the match has
>statistical significance).
>
>However this is only 90% confidence, so by saying that the result clearly shows
>that one program is better than the other one, you'll end up being wrong in 10%
>of the cases.


You know I wouldn't say that after 10 games. Not after 20 or even 50, especially
not if one can assume that the programs which play the match are almost at equal
strength. But the games give an impression. And often you can see from the games
what is happening on the board.


>If you assume that Tiger and Shredder are 70 elo points apart or less, then you
>need to play AT LEAST 40 games to show it with 90% confidence.
>
>If you assume they are 35 elo points apart or less, you'll need AT LEAST 100
>games (if you want 90% confidence).
>
>To differentiate a 23 elo points interval, be ready to play 400 games.
>
>And so on... just pick the line you wish in the tables.


Thanks for the table! Good work!


>    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.