Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Finally we know how good Crafty is...

Author: James T. Walker

Date: 19:32:42 04/11/00

Go up one level in this thread


On April 11, 2000 at 14:14:25, Dann Corbit wrote:

>On April 11, 2000 at 10:37:22, James T. Walker wrote:
>>On April 10, 2000 at 16:37:24, Dann Corbit wrote:
>>>On April 10, 2000 at 16:12:03, James T. Walker wrote:
>>>>On April 10, 2000 at 11:24:18, blass uri wrote:
>>>>>On April 10, 2000 at 08:03:41, James T. Walker wrote:
>>>>>>On April 10, 2000 at 03:24:17, Bernhard Bauer wrote:
>>>>><snipped>
>>>>>>>Someone in the past stated that commercial programs are at least 100 rating
>>>>>>>points stronger than Crafty, so now we may ask:
>>>>>>>Where are the commercials???
>>>>>>
>>>>>>**************************************************
>>>>>>What am I missing??
>>>>>>
>>>>>>1 Fritz 6.0  128MB K6-2 450 MHz           2721   39   -37   368   67%  2594
>>>>>>8 Crafty 17.07/CB 128MB K6-2 450 MHz      2624   46   -44   251   62%  2541
>>>>>>                                          -----
>>>>>>                                            97  Points
>>>>>
>>>>>97 is not at least 100
>>>>>
>>>>>Uri
>>>>
>>>>Hello Uri,
>>>>I believe the difference is more than 100 points at blitz.  Also 97 points has a
>>>>tolerance which means it could be more or less.
>Here are the top 9 entries from the list:
> Rating + - Games Won Average opposition
>1 Fritz 6.0 128MB K6-2 450 MHz  2721 39 -37 368 67% 2594
>2 Junior 6.0 128MB K6-2 450 MHz  2689 31 -30 565 68% 2557
>3 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz  2671 33 -32 486 64% 2572
>4 Fritz 5.32 128MB K6-2 450 MHz  2654 34 -32 474 65% 2543
>5 Nimzo 7.32 128MB K6-2 450 MHz  2653 33 -32 478 65% 2545
>6 Junior 5.0 128MB K6-2 450 MHz  2626 35 -33 438 63% 2530
>6 Hiarcs 7.32 128MB K6-2 450 MHz  2626 35 -34 432 62% 2539
>8 Crafty 17.07/CB 128MB K6-2 450 MHz  2624 46 -44 251 62% 2541
>9 Nimzo 99 128MB K6-2 450 MHz  2623 40 -39 318 60% 2549
>
>Within a single standard deviation, this means that the ELO of the programs is:
>1. Fritz 6 (2760 - 2684)
>2. Junior 6 (2720 - 2659)
>3. CT 12 (2704 - 2639)
>4. Fritz 5.32 (2688 - 2622)
>5. Nimzo 7.32 (2686 - 2621)
>6. Junior 5.0 (2661 - 2593)
>6. Hiarcs 7.32 (2661 - 2592)
>8. Crafty 17.07/CB (2670 - 2580)
>9. Nimzo 99 (2663 - 2584)
>
>From this it is clear that even with a single standard deviation, crafty may be
>within a few ELO points of Fritz, and could even be stronger than all of the
>programs except Fritz.  This gives a certainty of only about 2/3, even at that
>because we are talking about a single standard deviation.  If we allow two
>standard deviations, Crafty could easily be the strongest program (or the
>weakest -- as you can see very easily, the uncertainty is great and very little
>separates the top programs).
>
>>>The point is that Crafty is
>>>>still behind the top commercial programs.
>
>There is no evidence to support this in the SSDF.

************
The evidence is there.  YOU chose to ignore it because it does not support your
opinion.  It's early and more games are needed to solidify Crafty's place on the
list but the early signs are very much in line with my test.
************

>>>>Because they only come out once each
>>>>year, Crafty appears to be catching up more quickly than it really is.
>
>Upon what evidence do you purport this vacuuous argument?  Some real data
>please.

*******************
I have already said, I have a database of over 11,000 games now and it is still
growing.  Last June I archived about 7000 games and started a new database which
now contains over 4000 games.  Of these games Crafty has played about 700 vs the
top commercial programs.  Crafty does _n_o_t_ have a winning percentage against
any of the programs. (Fritz5.32/Fritz6/Junior5/Junior6/Hiarcs 7.32/Nimzo99)
Nimzo99 is the weakest at blitz and Crafty may well be stronger than Nimzo99 now
because I have not tested Nimzo99 lately.  Since I have tested many different
versions of Crafty it is evident that it is getting stronger.  The latest
version I have tested is Crafty 17.07 in the Chessbase GUI.  The earlier
versions were WCrafty using Remi's autoplayer232 in a DOS window.
*******************

>>>>The
>>>>improvement in Fritz this time was really great.  Crafty is still behind but
>>>>with Bob working overtime, it is slowly closing the gap.  One thing for sure,
>>>>you can't complain about the price of Crafty.  I really like Crafty in the
>>>>Chessbase interface.  It seems that all the problems have been fixed as far as
>>>>Chessbase/Crafty goes.  Crafty seems to score better using the Fritz book too.
>>>
>>>Since you have used "programs" it shows that you do not fully understand the
>>>publication by the SSDF.  Within experimental uncertainty (which they do list),
>>>crafty is clearly as good as many of the commercial programs.
>>******************
>>To tell someone that they don't understand something as simple as the SSDF list
>>is pretty silly.  You are taking on an "Air" of a know it all.
>
>Well, you clearly don't understand it.  You quoted the SSDF figure to support
>your argument and your 'explanation' shows that you don't know what the numbers
>mean.

**********
As I said, the SSDF list and my 11,000+ game database and watching many games on
ICC/FICS.
I know that you are "The Great Defender" of Crafty.  I also like Crafty.  It is
a great program.  But none of the data I have seen indicates it has caught up
with the latest commercial programs.  If you have any I would like to hear about
it.
**********

>>In the above
>>case I was also talking about Blitz games.
>
>Yes.  You did say that you thought the difference was more than 100 points at
>blitz.  However, I seriously doubt that you have the same type of quality
>control that the SSDF does.  Were all the crafty games played with the same
>version of crafty under identical conditions?

***************
Why do you doubt my statements?  Is it only because they do not support your own
opinion?  Do you know of any time in which I lied about any of the data I have
posted here?  Frankly I believe my quality control is better than the SSDF.
Since I spent the last 15 years of my career as a Quality Assurance
Representative in a Precision Measurements Laboratory, I think I have just an
inkling of what I'm doing and also an elementary understanding of stastical
analysis.  One of the things that I believe SSDF is doing wrong is not playing
an equal number of games vs all opponents.  The reason I think this is because
Program A may do well against Program B but not as well against program C even
though program B has a higher SSDF rating than program C.  If you accept this as
fact then you can see that if you play more games against an opponent which you
perform better against then you can elevate your rating to a false value while
playing less games against an opponent which gives you trouble.  I'm not sure
this is fact but in my opinion to prevent this from happening I think they
should play an equal number of games vs all opponents.  Of course I have not
played the same version of Crafty all the time.  I have tested new versions as
they appear but only about every 5th version.  I have tested all versions using
Hash=48M,Hashp=16M,Cache=10M when applicable.  It is my understanding that these
values are acceptable for Crafty.  I have played all commercial programs using
default settings except hash settings which have been consistent for each
program.
****************

>>I have more than 11,000 games in my
>>database now played between some of the top programS.  I am making my statement
>>based also on these games which I have watched.  I believe my certainty about
>>Crafty's rating is slightly better than just the SSDF list alone.
>
>I doubt very much if this is true.  However, I will admit is is possible that
>you are correct.  I suspect that you cannot defend your position mathematically.

************
Again you are expressing doubt but you have nothing to base _y_o_u_r_ doubt on.
I simply am not supporting your opinion and that makes me doubtful.  How many
games have you played with Crafty vs top commercial programs?
************

>>I have also
>>watched Crafty play on ICC/FICS using the quad vs Fritz/Junior.  In my opinion
>>Crafty is still behind the top programS.  I believe I'm entitled to my opinion
>>which is based on my experience.
>
>Of course.  All opinions are equal.  Mine is no better than yours.  However,
>mathematics can put more weight upon our arguements than feelings can.

*************
For a quick education, go to ICC and do a search on "TheComputer" vs "Crafty".
See if you can tell anything from the "Trends".  Guess when "TheComputer"
started using Fritz6 if you can.  I'll bet it's pretty easy.  But of course none
of this supports your opinion.  By the way I believe that "TheComputer" is
running on a K6-3-450.
*************


>> The list below has nothing to do with my
>>statement concerning Crafty's strength relative to the top programS so I will
>>just ignore it as irrelevant to the topic of discussion.
>[snipped]



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.