Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: PLEASE can someone test CM 6,7,8000 with other programs?

Author: John Merlino

Date: 13:13:57 04/02/01

Go up one level in this thread


On April 02, 2001 at 15:17:05, stuart taylor wrote:

>On April 02, 2001 at 14:07:58, John Merlino wrote:
>
>>On April 02, 2001 at 06:37:22, stuart taylor wrote:
>>
>>>On April 01, 2001 at 13:37:31, Jorge wrote:
>>>
>>>>On April 01, 2001 at 06:31:46, stuart taylor wrote:
>>>>
>>>>>On April 01, 2001 at 01:39:10, Jorge wrote:
>>>>>
>>>>>>On April 01, 2001 at 00:21:52, Lin Harper wrote:
>>>>>>
>>>>>>>On March 31, 2001 at 22:15:53, stuart taylor wrote:
>>>>>>>
>>>>>>>>I'm longing to get to the bottom of this, and to know exactly where each of
>>>>>>>>CM6000, CM7000 and CM8000 stand, in relation to each other, as well as compared
>>>>>>>>to a selection of other programs, and how the three CMs compare in their
>>>>>>>>comparisons to a few other programs.
>>>>>>>>  Can someone do this one time, despite any difficulties involved?
>>>>>>>>S.Taylor
>>>>>>>     I've got CM6K and CM8K, no other recent programs. I've only got one
>>>>>>>  computer, and that's the problem most people have. It's not good playing
>>>>>>>  two programs against each other, with ponder off on both programs, IMO,
>>>>>>>  because ponder on is the default on them all (I think), and that's the
>>>>>>>  only way to give a program it's full rein. Different programs will not
>>>>>>>  necessarily be handicapped equally with ponder off. That's where auto
>>>>>>>  232 comes in.
>>>>>>
>>>>>>Yes, I have 2 compupters (PentIII 667 and Athlon 500, 128 Ram)and have both
>>>>>>programs 6K and 8K installed. I don't have Auto232 though, so I play some of the
>>>>>>games by hand.
>>>>>
>>>>>That could do a good job, I think, if you play all play all 8 games, but 4 on
>>>>>each computer (2x Black and 2x White), and switch computers, so that each has
>>>>>the same benefits and the same slight handicaps. That will also show how much
>>>>>the slight speed differences affect each, if at all.
>>>>>S.Taylor
>>>>All right, I'm curious to find out too. Let me know which settings (Default? for
>>>>both 6K and 8K) and Time control I can use. I will post the games here.
>>>
>>>That would be wonderful!
>>>I think that tournament timings or thereabouts would be most interesting to know
>>>about. Other settings, whatever is both strongest, and most equal to each other,
>>>then rotate for the other half of the games.
>>>  If it can be an actual minature tournament, that would be excellent, by adding
>>>one, two or three other of the high level programs and making it all play all,
>>>so we can see how both CM's compare in their handling of each other program.
>>> But a simple match will also be great. And if it is 8 games each, that gives a
>>>very good chance for getting a good idea of things, as each computer can have
>>>twice white and twice black for each of the CM's.(6K and 8K)
>>> Four games each (in this way) is also good, especially if it is too much work,
>>>and you're adding other programs (even one).
>>>thanks,
>>>S.Taylor
>>
>>To get ANY reasonable conclusion from a tournament between two programs, I
>>suspect you would need at LEAST 20 games (and others here -- the more
>>statistically aware of us -- would say that you probably need closer to 50-100
>>games).
>>
>>Eight games will prove nothing, even if the score is 8-0 for the winner.
>>
>>Sadly, the only way to compare CM6000 vs. CM8000 is manually WITH TWO MACHINES.
>>Running both of these programs on the same machine gives a CPU advantage to
>>CM6000. I doubt many people have the time and the hardware for this kind of
>>testing, so getting the required number of games will take quite a long time.
>>
>>jm
>
>I feel convinced that CM6K vs. CM8K vs. 3 other top programs all play all 8
>times, in a sensible way, like what I believe Jorge is in a situation to do
>(i.e. half & half) would indicate things very clearly, if the results are
>clearly tilted one way or the other. It will also be possible to understand much
>more from studying the games, and the nature of the results than merely looking
>for cold statistics and nothing else.
>  An 8-0 score is pretty much conclusive. 5-3 may not be. I think 6.5-1.5 looks
>almost conclusive, whereas 6-2 is not yet.
>  And even if after 8x4 (all play all)[32 games each, but not just any old
>games] the results are quite close, alot of other factors could be seen, which
>would give a very good idea of where things stand. Certainly something to talk
>about.
>  And-above all, it will be MUCH MUCH better than the darkness we're in now! And
>I don't recall anyone on this board disagreeing that CM6K is stronger than CM8K.
>That's where things stand at present, but perhaps we can see something just a
>little bit clearer.
>  Even a short match would be better than nothing. It will STILL be somewhat
>ambiguous, but I strongly believe-a bit less so.
>S.Taylor

You may think that "6.5-1.5 looks almost conclusive", but it really,
statistically, does not allow for any conclusions. Dr. Hyatt has said that,
given enough games (or a string of bad luck), even a 10-0 score really proves
nothing! Look at this post (and thread):

http://www.chessusa.com/forums/1/message.shtml?160782

Now, truthfully I would honestly be concerned about 10-0 score. :-) But I would
still say that, statistically, it proves nothing because the total test sample
is too small.

However, you are correct in saying that ANALYZING the games CAN prove to be more
useful than mere hard results. But, this will take even MORE time, of course. To
analyze a tournament in which 5 engines played 32 games each, that would be
analyzing 80 games. Quite time consuming both in CPU time and in human scrutiny
of the analysis results.

I'm not saying that it shouldn't be done; I'm just questioning the quality of
the data against the amount of time it will take to acquire it.

On the completely other side of the discussion (which is about the "Selective
Search" setting) I am currently near the end of a 16-game tournament between
CM8000 SS=6 and CM8000 SS=12 (with both personalities using a 32MB hash table)
at tournament time controls. Right now, SS=12 is leading by 7.0-6.0 (+3 =8 -2).
So, once again, there is still no statistical data that can show that SS=12 is
better than SS=6, despite the fact that it, in many cases, it solves test suite
problems faster.

As for your comment ("I don't recall anyone on this board disagreeing that CM6K
is stronger than CM8K."), I also do not recall anybody stating with any
certainty that CM6K WAS stronger than CM8K. People have expressed their opinions
(on both sides of the argument), but nobody has shown any clear evidence one way
or another.

If this manual, two-machine tournament can be organized and completed (and I
really don't see the need for the other three engines, as we're really only
concerned about CM6K vs. CM8K), then finally we MIGHT have some useful data.
Until then, we are all just speculating....

jm



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.