Author: Graham Banks
Date: 03:09:40 10/18/05
Go up one level in this thread
On October 18, 2005 at 06:05:48, Uri Blass wrote: >On October 18, 2005 at 05:59:08, Graham Banks wrote: > >>On October 18, 2005 at 05:49:10, Uri Blass wrote: >> >>>On October 18, 2005 at 05:37:22, Graham Banks wrote: >>> >>>>On October 18, 2005 at 04:56:59, Uri Blass wrote: >>>> >>>>>On October 18, 2005 at 03:18:00, Graham Banks wrote: >>>>> >>>>>>On October 18, 2005 at 02:08:06, Uri Blass wrote: >>>>>> >>>>>>>From Heinz van Kempen's words: >>>>>>>>The majority of CEGT testers is not so keen on testing personalities, but Fruit >>>>>>>>is a special case. >>>>>>> >>>>>>>I do not see that fruit is a special case based on the list. >>>>>>>Based on looking at the list it seems that the only special case is chessmaster. >>>>>>> >>>>>>>Fruit has only one personality in the list except the default and I guess that >>>>>>>it is not going to have more than it when chessmaster has 10 personalities in >>>>>>>the list except the default. >>>>>>> >>>>>>>Of course it is the testers choice what to test but >>>>>>>I wonder what is the reason that they prefer testing chessmaster. >>>>>>> >>>>>>>I counted 10 different personalities except the default and it is not clear if >>>>>>>even one of them is stronger than the default when the possible error in the >>>>>>>default's rating is 23 elo points. >>>>>>> >>>>>>>13 CM10th Milan 2.3 2679 >>>>>>>14 CM10th Pestilence 2678 >>>>>>>15 CM10th Behemoth 2676 >>>>>>>16 CM10th Cell 2676 >>>>>>>20 CM10th Imperator 2665 >>>>>>>21 CM10th Default 2664 >>>>>>>26 CM10th Berean 5.54 2650 >>>>>>>27 CM10th Steadfast 2643 >>>>>>>30 CM10th Behemoth II 2634 >>>>>>>34 CM10th D1Meandros 2628 >>>>>>>35 CM10th Yoda 2.5 2627 >>>>>>> >>>>>>>Uri >>>>>>>Uri >>>>>> >>>>>> >>>>>>Hi Uri, >>>>>> >>>>>>one thing I can tell you with 100% certainty is that all of these CM10th >>>>>>settings are better than the default CM10th settings as the time control gets >>>>>>longer. I can provide proof of this if you require it. >>>>> >>>>>I do not know if you are correct and I doubt if you have enough games against >>>>>different opponents to prove it(I explain later in this post why I doubt if it >>>>>can be correct). >>>>> >>>>>Unfortunately CEGT are not very interesting in comparison between different time >>>>>control and I see only one chessmaster in 4/40 time control so we even have no >>>>>evidence that all these personalities improve relative to the default when the >>>>>time control is 40/40 relative to 40/4. >>>>> >>>>>> >>>>>>When I joined CEGT, I was asked to run the 6+6 tournament I'm running involving >>>>>>6 CM10th settings and 6 top engines. >>>>>>As I was also restructuring my CM10th Showdown tournament at this point in time, >>>>>>I offered to rerun it as a CEGT tournament. >>>>>> >>>>>>Note also that only the one setting of any program is included in one of the >>>>>>rating lists. >>>>>> >>>>>>I feel it's a little bit like sour grapes to start questioning the worth of CEGT >>>>>>now that Fruit 2.2 Uri isn't performing as well as hoped. >>>>>> >>>>>>Regards, Graham. >>>>> >>>>>It is not related. >>>>> >>>>>I also did not suggest that the CEGT will stop testing. >>>>>I did not claim that no testing is better than testing but only that I do not >>>>>understand the choice of the CEGT. >>>>> >>>>>Of course the CEGT like the SSDF is free to test what they want and if the SSDF >>>>>will also prefer to test 10 different personalities of one program it is their >>>>>right and I will not suggest them to stop testing because of it. >>>>> >>>>>It is not the first time that I do not understand the choice of CEGT. >>>>> >>>>>I also did not understand the choice to do small number of blitz games relative >>>>>to long time control. >>>>> >>>>>The choice of blitz of 4/40 also seemed to me not very good and I thought that >>>>>testers will prefer 2/40 for comparison with 40/40 but I read that some testers >>>>>even prefered slower time control in the blitz games that is simply against all >>>>>the idea of blitz games. >>>>> >>>>>The idea of blitz games is to compare between long time control and blitz to see >>>>>if there are programs that are probably better in blitz. >>>>> >>>>>It may be possible to try to speculate from it about longer time control. >>>>> >>>>>As far as I know we usually see relatively small difference between 4/40 and >>>>>40/40 and it may suggest that the difference in time control should be more than >>>>>1:10 in order to see big difference so if there is no significant difference >>>>>between CM default and other CM personality at 40/40 then I do not think that >>>>>there is going to be a significant difference between CM default and other CM >>>>>personality in a slower time control by factor of 2 or 3. >>>>> >>>>>Uri >>>> >>>> >>>> >>>>Note the time control used and the performance of the default settings in >>>>relation to 40/30 on my machine. >>>> >>>>THE GREAT CM10th SHOWDOWN! >>>> >>>>Athlon XP1900+ >>>>128mb hash each >>>>3,4,5 men tablebases >>>>Ponder on >>>>No opening books >>>>78 rounds (2 cycles) at 90 mins + 30 secs >>>> >>>> >>>>Standings after Round 32 >>>> >>>>20.5 - D1 Meandros >>>>20.0 - WoDra >>>>19.5 - GL >>>>19.5 - Milan 2.6 >>>>19.0 - SoFar 2 >>>>19.0 - Tsunami >>>>18.5 - Cell >>>>18.5 - Clown 1.01 >>>>17.5 - Beast >>>>17.5 - Milan 2.4 >>>>17.5 - Milan 1.5 >>>>17.5 - Milan 2.3 >>>>17.5 - R2D2 >>>>17.5 - Emperor >>>>17.0 - Undertaker >>>>16.5 - Behemoth >>>>16.5 - C3PO >>>>16.5 - Berean 5.54 >>>>16.0 - Milan 2.5 >>>>16.0 - R1X >>>>16.0 - D1 Pyr >>>>15.5 - Wrath >>>>15.5 - Scorpion >>>>15.5 - D2 Alos >>>>15.5 - Steadfast >>>>15.0 - Milan 2.1 >>>>15.0 - Salamander >>>>14.5 - Berean 5.53 >>>>14.5 - SoFar >>>>14.5 - Yoda 2.7 >>>>14.5 - Darth Vader >>>>14.5 - Schumacher >>>>14.5 - Juggernaut >>>>14.5 - Predator >>>>13.5 - Medusa >>>>12.5 - Myrddin >>>>12.5 - Solomon >>>>12.0 - Default >>>>12.0 - Cobra >>>>11.0 - Vegeta 2d >>> >>>No proof. >>> >>>number of games is not enough. >>> >>>The same program can score 12/32 in one tournament and 20/32 in another >>>tournament even without changing the time control. >>> >>Not under these conditions if you look - "no books" > >You may be right but this is not the CEGT conditions and they use the condition >of starting from predefined book positions. >A program that is better with no book at longer time control may be not better >when you use some small book. > >You have some evidence to support the opinion that chessmaster default may be >relatively worse at long time control but not enough to prove it even if we want >only 95% certainty. > >Uri Hi Uri, my CEGT CM10th Showdown is also using no books. Here is the comparison (only early days I know! :-) THE CEGT CM10th SHOWDOWN! Athlon XP1900+ 128mb hash each 3,4,5 men tablebases Ponder off No opening books 58 rounds (2 cycles) at 40 moves in 30 minutes repeating Standings after Round 14 of 58 9.5 - SoFar 2 9.0 - Medusa 9.0 - Default 9.0 - Pestilence 9.0 - Steadfast 8.5 - Emperor II 8.5 - GL 8.5 - Tsunami 8.5 - Behemoth 8.0 - D2 Alos 8.0 - WoDra 8.0 - Milan 2.3 8.0 - Cell 7.5 - D1 Meandros 7.0 - R10 7.0 - Berean 5.54 7.0 - Salamander 7.0 - R2D2 II 6.5 - Schumacher 6.5 - Behemoth II 6.5 - Undertaker 3 6.0 - Beast 5.5 - Milan 1.5 5.5 - SoFar 3 5.5 - Yoda 2.7 5.0 - Milan 2.6 4.5 - Fury 4.5 - Clown 1.01 3.5 - Boa 3.5 - Myrddin Regards, Graham.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.