Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: what type of result is significant in 100 game match

Author: Uri Blass

Date: 00:40:09 02/19/06

Go up one level in this thread


On February 18, 2006 at 14:49:01, Sandro Necchi wrote:

>On February 18, 2006 at 09:52:41, Uri Blass wrote:
>
>>On February 18, 2006 at 05:43:33, Sandro Necchi wrote:
>>
>>>On February 18, 2006 at 03:50:09, Uri Blass wrote:
>>>
>>>>My question is based on your experience what is the biggest result that A beat B
>>>>in match of 100 games(Noomen match or match based on other positions like Albert
>>>>Silver's postions) but still A is not better than B against other programs.
>>>>
>>>>Of course with opening books it is possible that one engine has a book that kill
>>>>the book of B when it is not better against other programs so we can know
>>>>nothing from a match with original books.
>>>>
>>>>I read claim that the better chessmaster personality against chessmaster
>>>>personalities was not better against other programs but I do not know what is
>>>>the result that the winner got that still it was not better and I believe that
>>>>result of 90-10 always mean that the winner is better from practical point of
>>>>view and the question is what is the minimal result that you can be sure based
>>>>on practical experience that the winner is better.
>>>>
>>>>Uri
>>>
>>>well, it depends:
>>>
>>>if you know that the program you are going to start a match against is the only
>>>one that is creating resistance, than a good score against it would be
>>>significant.
>>>
>>>I believe a good score should be at least 75%
>>>
>>>If this version is not stronger than other programs even a score higher than 75%
>>>could mean very little.
>>>
>>>I guess a score of 95% should mean something, but one needs to check if the
>>>score was heavily dependant on the opening book or not.
>>
>>I am surprised by this opinion.
>>Note that I am talking about matches from predefined positions like noomen
>>match.
>>
>>Can you show me a single case in the CEGT when A got 70% against B in at least
>>50 games and still A has worse results than B against other programs?
>
>I am not making tests like CETG as I am not interested in matches where the same
>book or predefined positions are set for all programs. This means that I am not
>checking these as well.
>
>I believe this type of test can be misleading and can give a limited amount of
>info, so I prefer tests like the SSDF ones.
>
>I like "realistic" tests and not "hypothetic" tests.

I think that this type of test give good information about the potential of the
engine(or the rating assuming the quality of the book maker of both sides are
similiar).

Note that you predict Rybka to be 150 elo better than shredder9 and results of
CEGT tend to agree with your estimate.

1 Rybka 1.01 Beta 13b 64-bit 2923 28 27 580 79.1 % 2692 24.3 %
2 Rybka 1.01 Beta 9 64-bit opti 2892 37 37 316 76.9 % 2683 24.7 %
14 Shredder 9 2754 8 8 4816 63.2 % 2660 31.9 %

you see that 150 elo difference between rybka and shredder is inside the error
bars.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.