Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: what type of result is significant in 100 game match

Author: Sandro Necchi

Date: 01:33:52 02/19/06

Go up one level in this thread


On February 19, 2006 at 03:48:15, Uri Blass wrote:

>On February 19, 2006 at 03:40:09, Uri Blass wrote:
>
>>On February 18, 2006 at 14:49:01, Sandro Necchi wrote:
>>
>>>On February 18, 2006 at 09:52:41, Uri Blass wrote:
>>>
>>>>On February 18, 2006 at 05:43:33, Sandro Necchi wrote:
>>>>
>>>>>On February 18, 2006 at 03:50:09, Uri Blass wrote:
>>>>>
>>>>>>My question is based on your experience what is the biggest result that A beat B
>>>>>>in match of 100 games(Noomen match or match based on other positions like Albert
>>>>>>Silver's postions) but still A is not better than B against other programs.
>>>>>>
>>>>>>Of course with opening books it is possible that one engine has a book that kill
>>>>>>the book of B when it is not better against other programs so we can know
>>>>>>nothing from a match with original books.
>>>>>>
>>>>>>I read claim that the better chessmaster personality against chessmaster
>>>>>>personalities was not better against other programs but I do not know what is
>>>>>>the result that the winner got that still it was not better and I believe that
>>>>>>result of 90-10 always mean that the winner is better from practical point of
>>>>>>view and the question is what is the minimal result that you can be sure based
>>>>>>on practical experience that the winner is better.
>>>>>>
>>>>>>Uri
>>>>>
>>>>>well, it depends:
>>>>>
>>>>>if you know that the program you are going to start a match against is the only
>>>>>one that is creating resistance, than a good score against it would be
>>>>>significant.
>>>>>
>>>>>I believe a good score should be at least 75%
>>>>>
>>>>>If this version is not stronger than other programs even a score higher than 75%
>>>>>could mean very little.
>>>>>
>>>>>I guess a score of 95% should mean something, but one needs to check if the
>>>>>score was heavily dependant on the opening book or not.
>>>>
>>>>I am surprised by this opinion.
>>>>Note that I am talking about matches from predefined positions like noomen
>>>>match.
>>>>
>>>>Can you show me a single case in the CEGT when A got 70% against B in at least
>>>>50 games and still A has worse results than B against other programs?
>>>
>>>I am not making tests like CETG as I am not interested in matches where the same
>>>book or predefined positions are set for all programs. This means that I am not
>>>checking these as well.
>>>
>>>I believe this type of test can be misleading and can give a limited amount of
>>>info, so I prefer tests like the SSDF ones.
>>>
>>>I like "realistic" tests and not "hypothetic" tests.
>>
>>I think that this type of test give good information about the potential of the
>>engine(or the rating assuming the quality of the book maker of both sides are
>>similiar).
>>
>>Note that you predict Rybka to be 150 elo better than shredder9 and results of
>>CEGT tend to agree with your estimate.
>>
>>1 Rybka 1.01 Beta 13b 64-bit 2923 28 27 580 79.1 % 2692 24.3 %
>>2 Rybka 1.01 Beta 9 64-bit opti 2892 37 37 316 76.9 % 2683 24.7 %
>>14 Shredder 9 2754 8 8 4816 63.2 % 2660 31.9 %
>>
>>you see that 150 elo difference between rybka and shredder is inside the error
>>bars.
>>
>>Uri
>
>I accidentally posted position number 2 instead of position number 3.
>
>If we use the 32 bit version of rybka then the difference is now only 115 elo
>but again if we consider that there is going to be additional improvement in
>rybka even the 32 bit version of rybka1.2 may be 150 elo better than shredder9
>in cegt and it is similiar to what you predict for ssdf.

I know this, but most people speak/write without thinking or asking what is
behind an expectation...

>
>3 Rybka 1.01 Beta 13-13b 32-bit 2869 28 28 494 73.9 % 2689 27.9 %
>
>I see that generally there is no big difference between results of ssdf and
>cegt.

I know this, but I prefer to evaluate the complete program and not the engine
alone.
The reason is that the complete program (engine+book+ETB) is what the user will
get.

>
>There are small differences but I think that usually when there is a difference
>of more than 50 elo in one list the better program in one list is also better in
>the second list.

I am not against CETG at all, but I prefer the SSDF list, because the program
MAY BE developped considering that a good book is available as well as ETB and
to remove them would handicap it.

>
>Uri

Sandro



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.