Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Mac Hiarcs swats away CT2004: 60+30 10 games

Author: Uri Blass

Date: 02:27:35 01/29/05

Go up one level in this thread


On January 29, 2005 at 05:23:13, George Sobala wrote:

>On January 29, 2005 at 04:41:39, Kurt Utzinger wrote:
>
>>On January 29, 2005 at 02:52:43, George Sobala wrote:
>>
>>>On January 29, 2005 at 01:50:24, Kurt Utzinger wrote:
>>>
>>>>      Hi George
>>>>      Your test shows two (well known) things: First of all
>>>>      that 10 games are much too less to conclude something
>>>>      and furthermore that the influence of hardware is much
>>>>      overestimated -:) Is it worth studying the games? I
>>>>      hope you will continue this match up to 50 games.
>>>>      Kurt [http://www.utzingerk.com]
>>>
>>>I disagree with you about the "conclude something"!
>>>
>>>e.g.
>>>
>>>http://www.fon.hum.uva.nl/Service/Statistics/Sign_Test.html
>>>
>>>using n+ = 5 and n- = 0
>>>
>>>shows that this 10 game match indicates with greater than 90% probability that
>>>Hiarcs 9.6 on an eMac G4 1.25GHz is stronger than CT2004 on a 1.6GHz Centrino at
>>>60+30 time control. A 10 game match does not tell us by how much.
>>>
>>>I think it is worth studying the games, I enjoyed playing through them
>>>afterwards.
>>
>>
>>      I very much doubt that it's possible to trust any statistic
>>      on the basis of such less data -:) What if the next 10 games
>>      bring a result of 2,5-7,5 in favour of ChessTiger 2004? This
>>      would be no surprise as I have often seen in computer matches.
>>      Kurt
>
>The sign test in one of the most tobust and reliable tests in statistics.
>Perhaps you would enjoy studying some statistical theory, rather than just
>relying on brute force megamatches! Of course megamatches yield more accurate
>results, but some conclusions can be drawn on lesser data.
>
>The correct, true and mathematically irrefutable statistical interpretation of
>the result I posted is that if Hiarcs and CT2004 were of equal strength, then
>such an extreme result as +5 -0 =5 would only occur about 6% of the time. I.e.
>in a 200-game match between two equal engines, there would be approximately one
>group of 10 games that contained such an extreme result.
>
>Note that there is a difference statistically between +5 -0 =5, +6 -1 =4 and +7
>-2 =1: the latter two results are less statistically "extreme" (i.e. unlikely).
>This is actually an interesting observation as the three results give the same
>ELO differences.
>
>What if the next 10 games bring a result of 2,5-7,5 in favour of ChessTiger
>2004? The statistics thus far indicate that this would be very unlikely. If it
>did happen, then we would have more data to play with!

The statistics that you use assume that games are independent events.
It is not the case because engines change their opening choice because of
learning.

Uri



This page took 0.53 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.