Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Contrast in playing strength.

Author: Robert Hyatt
Date: 18:59:47 08/31/98
On August 31, 1998 at 21:33:14, Mark Young wrote:

>On August 31, 1998 at 19:46:39, Robert Hyatt wrote:
>
>>On August 31, 1998 at 17:57:02, Mark Young wrote:
>>
>>>Is computer Vs computer testing now useless in gauging a chess program s
>>>strength playing humans? When Crafty gets killed playing Junior 5 by a wide
>>>margin. And Fritz 5 draws a match with Rebel 10 even when Rebel 10 has a 2x
>>>hardware advantage. Is it time to abandon Computer Vs Computer testing all
>>>together? Or are we going to have two standards to judge chess programs? One
>>>chess program being the best playing other chess programs and one chess program
>>>being the best playing humans. And if so what is the best standard to judge a
>>>programs overall strength? Is it better marketing to show you can destroy all
>>>other programs like Junior5 and Fritz 5 can do, or is it better to show you can
>>>beat a top grandmaster like Rebel 10 can do?
>>
>>
>>As I've said many times, you are talking about *two* different games at
>>present.  As an example, take CSTal, which might do very well against a
>>human with its speculative/complicated style, but which does very badly
>>against fast searchers.  If you were to measure CSTal's worth by only
>>playing against fast programs, you might toss it out.  If you only measured
>>it by playing against humans, you might decide it is the best there is.  In
>>reality, both answers (or neither answer) could be right...
>
>I know, but the gulf between computer Vs computer results and computer Vs human
>results has now reach a point of absurdity. Take your program for example.
>Crafty went  16, =4, +0 playing Junior 5. If you add the games I played with
>Fritz 5 also a fast searcher the results are  21, =4, +0 for Crafty. That would
>suggest a rating difference of 392 points. Which would make Crafty only an
>expert rated chess player. Crafty is not the only program that suffers for this.
>M-chess pro has taken a big hit in Ed s testing. Rebel has also but to a lesser
>extent. The point being with the result so skewed now. Is computer Vs computer
>testing more harmful then helpful to chess programmer and the buying public,
>unless the goal of chess programmers is now to just try and beat each other and
>the hell with the consumer and how the program performs when playing people.


I haven't spent a lot of time looking at the results from Moritz' testing.
In the case of crafty vs fritz, that was a known problem...  that seems to
be fixed at present.  As far as Junior vs Crafty, however, testing on a
single machine does some things I didn't allow for...  IE such as *knowing*
that crafty will predict moves correctly and save time here and there.  But
with no "pondering" this can't happen, the assumption is wrong, and the time
utilization gets out of whack.  There are other potential problems...  Crafty
isn't a great bullet player against other programs... some of the things I
do in the search really hurt it when the searches are shallow.  So there's
plenty to go wrong...  The games vs Fritz were much more interesting to me,
in that at least they were 40/2 with full machines on both ends...

We can certainly try to resume that when you want...

Bob

BTW, one interesting thing I have noticed and reported before...  it is
quite common for a small change in one version of a program to make a huge
difference in engine-vs-engine testing...  I have done things that made
crafty vs crafty go 3-1 in the new versions favor, yet on ICC it hardly
changes the ratings at all...

So lots of small things can make really big differences...  and until I
have time to study more of what junior's games reveal, I really can't say
much about what is going on..
Re: Contrast in playing strength. Mark Young 10:04:25 09/01/98
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.