Author: Sandro Necchi
Date: 01:33:52 02/19/06
Go up one level in this thread
On February 19, 2006 at 03:48:15, Uri Blass wrote: >On February 19, 2006 at 03:40:09, Uri Blass wrote: > >>On February 18, 2006 at 14:49:01, Sandro Necchi wrote: >> >>>On February 18, 2006 at 09:52:41, Uri Blass wrote: >>> >>>>On February 18, 2006 at 05:43:33, Sandro Necchi wrote: >>>> >>>>>On February 18, 2006 at 03:50:09, Uri Blass wrote: >>>>> >>>>>>My question is based on your experience what is the biggest result that A beat B >>>>>>in match of 100 games(Noomen match or match based on other positions like Albert >>>>>>Silver's postions) but still A is not better than B against other programs. >>>>>> >>>>>>Of course with opening books it is possible that one engine has a book that kill >>>>>>the book of B when it is not better against other programs so we can know >>>>>>nothing from a match with original books. >>>>>> >>>>>>I read claim that the better chessmaster personality against chessmaster >>>>>>personalities was not better against other programs but I do not know what is >>>>>>the result that the winner got that still it was not better and I believe that >>>>>>result of 90-10 always mean that the winner is better from practical point of >>>>>>view and the question is what is the minimal result that you can be sure based >>>>>>on practical experience that the winner is better. >>>>>> >>>>>>Uri >>>>> >>>>>well, it depends: >>>>> >>>>>if you know that the program you are going to start a match against is the only >>>>>one that is creating resistance, than a good score against it would be >>>>>significant. >>>>> >>>>>I believe a good score should be at least 75% >>>>> >>>>>If this version is not stronger than other programs even a score higher than 75% >>>>>could mean very little. >>>>> >>>>>I guess a score of 95% should mean something, but one needs to check if the >>>>>score was heavily dependant on the opening book or not. >>>> >>>>I am surprised by this opinion. >>>>Note that I am talking about matches from predefined positions like noomen >>>>match. >>>> >>>>Can you show me a single case in the CEGT when A got 70% against B in at least >>>>50 games and still A has worse results than B against other programs? >>> >>>I am not making tests like CETG as I am not interested in matches where the same >>>book or predefined positions are set for all programs. This means that I am not >>>checking these as well. >>> >>>I believe this type of test can be misleading and can give a limited amount of >>>info, so I prefer tests like the SSDF ones. >>> >>>I like "realistic" tests and not "hypothetic" tests. >> >>I think that this type of test give good information about the potential of the >>engine(or the rating assuming the quality of the book maker of both sides are >>similiar). >> >>Note that you predict Rybka to be 150 elo better than shredder9 and results of >>CEGT tend to agree with your estimate. >> >>1 Rybka 1.01 Beta 13b 64-bit 2923 28 27 580 79.1 % 2692 24.3 % >>2 Rybka 1.01 Beta 9 64-bit opti 2892 37 37 316 76.9 % 2683 24.7 % >>14 Shredder 9 2754 8 8 4816 63.2 % 2660 31.9 % >> >>you see that 150 elo difference between rybka and shredder is inside the error >>bars. >> >>Uri > >I accidentally posted position number 2 instead of position number 3. > >If we use the 32 bit version of rybka then the difference is now only 115 elo >but again if we consider that there is going to be additional improvement in >rybka even the 32 bit version of rybka1.2 may be 150 elo better than shredder9 >in cegt and it is similiar to what you predict for ssdf. I know this, but most people speak/write without thinking or asking what is behind an expectation... > >3 Rybka 1.01 Beta 13-13b 32-bit 2869 28 28 494 73.9 % 2689 27.9 % > >I see that generally there is no big difference between results of ssdf and >cegt. I know this, but I prefer to evaluate the complete program and not the engine alone. The reason is that the complete program (engine+book+ETB) is what the user will get. > >There are small differences but I think that usually when there is a difference >of more than 50 elo in one list the better program in one list is also better in >the second list. I am not against CETG at all, but I prefer the SSDF list, because the program MAY BE developped considering that a good book is available as well as ETB and to remove them would handicap it. > >Uri Sandro
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.