Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: the guy who said cm6000 is stronger than 8000 is right!

Author: James T. Walker

Date: 17:46:36 05/22/01

Go up one level in this thread


On May 22, 2001 at 15:25:08, Dann Corbit wrote:

>On May 22, 2001 at 14:50:00, James T. Walker wrote:
>
>>On May 22, 2001 at 13:39:22, Dann Corbit wrote:
>>
>>>On May 22, 2001 at 13:27:09, stuart taylor wrote:
>>>[snip]
>>>>Yes, Crafty is number 4, which I overlooked. Sorry!
>>>>But I didn't overlook the comercial list. But that was very few games, which is
>>>>good, but says very little. But you can't just bungle all the amatuer programs
>>>>together with it to make CM8K to look so great overall. It's nowhere near the
>>>>same category as tests against recent comercial programs.
>>>
>>>In what way?
>>>
>>>According to the last WMCCC, the world champion was Shredder.  The runner up was
>>>Ferret, an amateur program.
>>>
>>>In the previous CCT contest, in which several commercial programs particpated,
>>>the winner was...
>>>Crafty -- an amateur program.
>>>
>>>Look at the recent Leiden contest.  Some amateur programs were near the top and
>>>triumphed over some commercial entries.  There used to be a large gap between
>>>the amateur and commercial programs.  I believe that the gap was mostly due to
>>>superior opening books of the commercial programs.  That gap has narrowed, as
>>>the amateur entries now operate with sophisticated opening books.
>>>
>>>I believe that the gap between the strongest amateur programs and the strongest
>>>commercial programs is very small.  Of course, there is not enough empirical
>>>data to back up my assertion, so it is only an opinion.
>>
>>What do you mean by "very small??"  The latest SSDF list has Deep Fritz at 2650
>>after 470 games and Crafty 17.07 at 2487 after 857 games.  What would constitute
>>"enough empirical data??"  On what empirical data do you base your "opinion?"
>
>   1 Deep Fritz  128MB K6-2 450 MHz          2650   34   -32   470   66%  2537
>   2 Fritz 6.0  128MB K6-2 450 MHz           2626   24   -24   897   66%  2512
>   3 Junior 6.0  128MB K6-2 450 MHz          2594   22   -21  1109   64%  2490
>   4 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2578   27   -27   691   62%  2492
>   6 Fritz 5.32  128MB K6-2 450 MHz          2547   26   -26   741   59%  2485
>   6 Nimzo 7.32  128MB K6-2 450 MHz          2547   24   -24   857   59%  2485
>   8 Nimzo 8.0  128MB K6-2 450 MHz           2539   30   -30   546   58%  2486
>   9 Gandalf 4.32f  128MB K6-2 450 MHz       2529   29   -29   584   52%  2518
>  10 Junior 5.0  128MB K6-2 450 MHz          2528   26   -25   750   57%  2476
>  11 Hiarcs 7.01  128MB K6-2 450 MHz         2526   37   -37   361   48%  2539
>  12 Hiarcs 7.32  128MB K6-2 450 MHz         2525   27   -27   679   56%  2481
>  13 SOS  128MB  K6-2 450 MHz                2524   23   -23   925   53%  2501
>  14 Rebel Century 3.0  128MB K6-2 450 MHz   2514   31   -31   504   50%  2516
>  15 Goliath Light  128MB K6-2 450 MHz       2496   30   -30   546   46%  2527
>  16 Crafty 17.07/CB 128MB K6-2 450 MHz      2487   24   -24   857   47%  2505
>
>In order for confidence to rise from 2/3 to 95%, we must use two standard
>deviations.  Within that error bar, Deep Fritz is:
>2650-(32*2)= 2586 ELO. 2650 + 34*2 = 2718
>and Crafty 17.07 (quite an old version) is:
>2487 + 24*2 = 2535 ELO.  2487 - (24*2) = 2439
>
>So, considering the error bars (2439, 2535) : (2586, 2718) we *can* say Deep
>Fritz is a little stronger with pretty good certainty.  But (of course) crafty
>has gone through a raft of versions since then.  Is the same difference still
>true?
>
>Yace is similar in strength.  If given opening books of equal quality, I suspect
>that the best amateur programs (e.g. Yace, Crafty) are very close to even with
>the strongest commercial programs.
>
>I think that most people have no clue what the numbers in the SSDF list mean
>(and I mean not a *single* one of the numbers) and that's too bad.  Not saying
>that you don't of course.  But even the ELO figure is widely misunderstood.
>
>Does the autoplayer used to play these games still issue a reset between each
>move to the engines (which the commercial programs are designed to ignore)?
>
>Personally, I think the learning attribute of some programs is not a slant
>against the autoplayer, since the games will learn in actual use also and
>thereby improve.  But it raises an interesting question.  Have the learning
>programs been playing longer than new entries and gaining constantly with their
>learning files?  If so, is the test accurate?
>
>In other words, I think:
>1.  The SSDF is definitely the best data we have available to determine
>engine/engine strength estimates
>
>2.  The data rapidly ages with new versions of programs [e.g. Tiger 12 -- aren't
>we on Tiger 14 now?, crafty 17.07 -- aren't we on 18.9 now?]
>
>3.  The data is valid ONLY for the machines in actual use and the exact
>conditions of the trials.  On different architectures, the engines quite likely
>will play very differently.  I have observed this effect very much so with Intel
>compiler builds running on older and on newer machines.  Up to a factor of 4 in
>difference from what you would expect judging by CPU MHz alone.
>
>4.  There may be flaws with the experiment (but I have yet to see a better
>design)
>
>5.  The error bands in the strength figures are widely misunderstood.
>
>I don't know that you will agree with me, but I expect you can see what I am
>driving at by now.

Hello Dann,
Of course I agree with your numbers since they are based on solid math
principles.  The problem is this same "excuse" has been used since I first came
here about 3.5 years ago and started testing Crafty 14.something.  It was
approximately 150 points behind the latest commercial programs then and as near
as I can tell it's about the same distance behind now.  Of course you can use
the uncertainty factor to explain that they may be closer than the numbers show
but that has been the same excuse for 3 years.  I think if anything the past 3
years has shown that the data is more reliable than you are willing to accept.
Yes you change commercial programs every year or two and Crafty changes every
month or less so you will always have this "excuse".  The problem with your
uncertainty factor is that they have never merged together as you indicate they
possibly could.  I have no doubt that Bob is leading the way in several areas
concerning chess programming and a lot of programmers are following his lead in
these areas but to me it is obvious by now that Crafty's strength overall is
still lagging behind the top commercial programs.  I have my own database of
over 12000 games now and several versions of Crafty are included and the data
shows that Crafty is improving at about the same rate as the top commercial
programs in computer vs computer games.  I don't have a clue about computer vs
human contest since I believe this is very different and does not necessarily
follow the comp vs comp statistics.  In any case we can agree to not agree on
this and there is no need to argue about it.  Believe me if/when Crafty tops the
SSDF list I will be one of the loudest in cheering.  I will also cheer when
Ferrett becomes available to the public.  ( I will cheer for about any
advancement in computer chess :-) )  In the meantime I enjoy matching the
programs against each other on equal hardware and watching the fireworks.
Regards,
Jim



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.