Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Proof

Author: Graham Banks

Date: 03:09:40 10/18/05

Go up one level in this thread


On October 18, 2005 at 06:05:48, Uri Blass wrote:

>On October 18, 2005 at 05:59:08, Graham Banks wrote:
>
>>On October 18, 2005 at 05:49:10, Uri Blass wrote:
>>
>>>On October 18, 2005 at 05:37:22, Graham Banks wrote:
>>>
>>>>On October 18, 2005 at 04:56:59, Uri Blass wrote:
>>>>
>>>>>On October 18, 2005 at 03:18:00, Graham Banks wrote:
>>>>>
>>>>>>On October 18, 2005 at 02:08:06, Uri Blass wrote:
>>>>>>
>>>>>>>From Heinz van Kempen's words:
>>>>>>>>The majority of CEGT testers is not so keen on testing personalities, but Fruit
>>>>>>>>is a special case.
>>>>>>>
>>>>>>>I do not see that fruit is a special case based on the list.
>>>>>>>Based on looking at the list it seems that the only special case is chessmaster.
>>>>>>>
>>>>>>>Fruit has only one personality in the list except the default and I guess that
>>>>>>>it is not going to have more than it when chessmaster has 10 personalities in
>>>>>>>the list except the default.
>>>>>>>
>>>>>>>Of course it is the testers choice what to test but
>>>>>>>I wonder what is the reason that they prefer testing chessmaster.
>>>>>>>
>>>>>>>I counted 10 different personalities except the default and it is not clear if
>>>>>>>even one of them is stronger than the default when the possible error in the
>>>>>>>default's rating is 23 elo points.
>>>>>>>
>>>>>>>13 CM10th Milan 2.3 2679
>>>>>>>14 CM10th Pestilence 2678
>>>>>>>15 CM10th Behemoth 2676
>>>>>>>16 CM10th Cell 2676
>>>>>>>20 CM10th Imperator 2665
>>>>>>>21 CM10th Default 2664
>>>>>>>26 CM10th Berean 5.54 2650
>>>>>>>27 CM10th Steadfast 2643
>>>>>>>30 CM10th Behemoth II 2634
>>>>>>>34 CM10th D1Meandros 2628
>>>>>>>35 CM10th Yoda 2.5 2627
>>>>>>>
>>>>>>>Uri
>>>>>>>Uri
>>>>>>
>>>>>>
>>>>>>Hi Uri,
>>>>>>
>>>>>>one thing I can tell you with 100% certainty is that all of these CM10th
>>>>>>settings are better than the default CM10th settings as the time control gets
>>>>>>longer.  I can provide proof of this if you require it.
>>>>>
>>>>>I do not know if you are correct and I doubt if you have enough games against
>>>>>different opponents to prove it(I explain later in this post why I doubt if it
>>>>>can be correct).
>>>>>
>>>>>Unfortunately CEGT are not very interesting in comparison between different time
>>>>>control and I see only one chessmaster in 4/40 time control so we even have no
>>>>>evidence that all these personalities improve relative to the default when the
>>>>>time control is 40/40 relative to 40/4.
>>>>>
>>>>>>
>>>>>>When I joined CEGT, I was asked to run the 6+6 tournament I'm running involving
>>>>>>6 CM10th settings and 6 top engines.
>>>>>>As I was also restructuring my CM10th Showdown tournament at this point in time,
>>>>>>I offered to rerun it as a CEGT tournament.
>>>>>>
>>>>>>Note also that only the one setting of any program is included in one of the
>>>>>>rating lists.
>>>>>>
>>>>>>I feel it's a little bit like sour grapes to start questioning the worth of CEGT
>>>>>>now that Fruit 2.2 Uri isn't performing as well as hoped.
>>>>>>
>>>>>>Regards, Graham.
>>>>>
>>>>>It is not related.
>>>>>
>>>>>I also did not suggest that the CEGT will stop testing.
>>>>>I did not claim that no testing is better than testing but only that I do not
>>>>>understand the choice of the CEGT.
>>>>>
>>>>>Of course the CEGT like the SSDF is free to test what they want and if the SSDF
>>>>>will also prefer to test 10 different personalities of one program it is their
>>>>>right and I will not suggest them to stop testing because of it.
>>>>>
>>>>>It is not the first time that I do not understand the choice of CEGT.
>>>>>
>>>>>I also did not understand the choice to do small number of blitz games relative
>>>>>to long time control.
>>>>>
>>>>>The choice of blitz of 4/40 also seemed to me not very good and I thought that
>>>>>testers will prefer 2/40 for comparison with 40/40 but I read that some testers
>>>>>even prefered slower time control in the blitz games that is simply against all
>>>>>the idea of blitz games.
>>>>>
>>>>>The idea of blitz games is to compare between long time control and blitz to see
>>>>>if there are programs that are probably better in blitz.
>>>>>
>>>>>It may be possible to try to speculate from it about longer time control.
>>>>>
>>>>>As far as I know we usually see relatively small difference between 4/40 and
>>>>>40/40 and it may suggest that the difference in time control should be more than
>>>>>1:10 in order to see big difference so if there is no significant difference
>>>>>between CM default and other CM personality at 40/40 then I do not think that
>>>>>there is going to be a significant difference between CM default and other CM
>>>>>personality in a slower time control by factor of 2 or 3.
>>>>>
>>>>>Uri
>>>>
>>>>
>>>>
>>>>Note the time control used and the performance of the default settings in
>>>>relation to 40/30 on my machine.
>>>>
>>>>THE GREAT CM10th SHOWDOWN!
>>>>
>>>>Athlon XP1900+
>>>>128mb hash each
>>>>3,4,5 men tablebases
>>>>Ponder on
>>>>No opening books
>>>>78 rounds (2 cycles) at 90 mins + 30 secs
>>>>
>>>>
>>>>Standings after Round 32
>>>>
>>>>20.5 - D1 Meandros
>>>>20.0 - WoDra
>>>>19.5 - GL
>>>>19.5 - Milan 2.6
>>>>19.0 - SoFar 2
>>>>19.0 - Tsunami
>>>>18.5 - Cell
>>>>18.5 - Clown 1.01
>>>>17.5 - Beast
>>>>17.5 - Milan 2.4
>>>>17.5 - Milan 1.5
>>>>17.5 - Milan 2.3
>>>>17.5 - R2D2
>>>>17.5 - Emperor
>>>>17.0 - Undertaker
>>>>16.5 - Behemoth
>>>>16.5 - C3PO
>>>>16.5 - Berean 5.54
>>>>16.0 - Milan 2.5
>>>>16.0 - R1X
>>>>16.0 - D1 Pyr
>>>>15.5 - Wrath
>>>>15.5 - Scorpion
>>>>15.5 - D2 Alos
>>>>15.5 - Steadfast
>>>>15.0 - Milan 2.1
>>>>15.0 - Salamander
>>>>14.5 - Berean 5.53
>>>>14.5 - SoFar
>>>>14.5 - Yoda 2.7
>>>>14.5 - Darth Vader
>>>>14.5 - Schumacher
>>>>14.5 - Juggernaut
>>>>14.5 - Predator
>>>>13.5 - Medusa
>>>>12.5 - Myrddin
>>>>12.5 - Solomon
>>>>12.0 - Default
>>>>12.0 - Cobra
>>>>11.0 - Vegeta 2d
>>>
>>>No proof.
>>>
>>>number of games is not enough.
>>>
>>>The same program can score 12/32 in one tournament and 20/32 in another
>>>tournament even without changing the time control.
>>>
>>Not under these conditions if you look - "no books"
>
>You may be right but this is not the CEGT conditions and they use the condition
>of starting from predefined book positions.

>A program that is better with no book at longer time control may be not better
>when you use some small book.
>
>You have some evidence to support the opinion that chessmaster default may be
>relatively worse at long time control but not enough to prove it even if we want
>only 95% certainty.
>
>Uri


Hi Uri,

my CEGT CM10th Showdown is also using no books. Here is the comparison (only
early days I know!   :-)

THE CEGT CM10th SHOWDOWN!

Athlon XP1900+
128mb hash each
3,4,5 men tablebases
Ponder off
No opening books
58 rounds (2 cycles) at 40 moves in 30 minutes repeating


Standings after Round 14 of 58

9.5 - SoFar 2
9.0 - Medusa
9.0 - Default
9.0 - Pestilence
9.0 - Steadfast
8.5 - Emperor II
8.5 - GL
8.5 - Tsunami
8.5 - Behemoth
8.0 - D2 Alos
8.0 - WoDra
8.0 - Milan 2.3
8.0 - Cell
7.5 - D1 Meandros
7.0 - R10
7.0 - Berean 5.54
7.0 - Salamander
7.0 - R2D2 II
6.5 - Schumacher
6.5 - Behemoth II
6.5 - Undertaker 3
6.0 - Beast
5.5 - Milan 1.5
5.5 - SoFar 3
5.5 - Yoda 2.7
5.0 - Milan 2.6
4.5 - Fury
4.5 - Clown 1.01
3.5 - Boa
3.5 - Myrddin


Regards, Graham.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.