Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Finally we know how good Crafty is...

Author: Dann Corbit

Date: 10:57:42 04/12/00

Go up one level in this thread


On April 11, 2000 at 22:32:42, James T. Walker wrote:
>On April 11, 2000 at 14:14:25, Dann Corbit wrote:
>>On April 11, 2000 at 10:37:22, James T. Walker wrote:
>>>On April 10, 2000 at 16:37:24, Dann Corbit wrote:
>>>>On April 10, 2000 at 16:12:03, James T. Walker wrote:
>>>>>On April 10, 2000 at 11:24:18, blass uri wrote:
>>>>>>On April 10, 2000 at 08:03:41, James T. Walker wrote:
>>>>>>>On April 10, 2000 at 03:24:17, Bernhard Bauer wrote:
>>>>>><snipped>
>>>>>>>>Someone in the past stated that commercial programs are at least 100 rating
>>>>>>>>points stronger than Crafty, so now we may ask:
>>>>>>>>Where are the commercials???
>>>>>>>
>>>>>>>**************************************************
>>>>>>>What am I missing??
>>>>>>>
>>>>>>>1 Fritz 6.0  128MB K6-2 450 MHz           2721   39   -37   368   67%  2594
>>>>>>>8 Crafty 17.07/CB 128MB K6-2 450 MHz      2624   46   -44   251   62%  2541
>>>>>>>                                          -----
>>>>>>>                                            97  Points
>>>>>>
>>>>>>97 is not at least 100
>>>>>>
>>>>>>Uri
>>>>>
>>>>>Hello Uri,
>>>>>I believe the difference is more than 100 points at blitz.  Also 97 points has a
>>>>>tolerance which means it could be more or less.
>>Here are the top 9 entries from the list:
>> Rating + - Games Won Average opposition
>>1 Fritz 6.0 128MB K6-2 450 MHz  2721 39 -37 368 67% 2594
>>2 Junior 6.0 128MB K6-2 450 MHz  2689 31 -30 565 68% 2557
>>3 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz  2671 33 -32 486 64% 2572
>>4 Fritz 5.32 128MB K6-2 450 MHz  2654 34 -32 474 65% 2543
>>5 Nimzo 7.32 128MB K6-2 450 MHz  2653 33 -32 478 65% 2545
>>6 Junior 5.0 128MB K6-2 450 MHz  2626 35 -33 438 63% 2530
>>6 Hiarcs 7.32 128MB K6-2 450 MHz  2626 35 -34 432 62% 2539
>>8 Crafty 17.07/CB 128MB K6-2 450 MHz  2624 46 -44 251 62% 2541
>>9 Nimzo 99 128MB K6-2 450 MHz  2623 40 -39 318 60% 2549
>>
>>Within a single standard deviation, this means that the ELO of the programs is:
>>1. Fritz 6 (2760 - 2684)
>>2. Junior 6 (2720 - 2659)
>>3. CT 12 (2704 - 2639)
>>4. Fritz 5.32 (2688 - 2622)
>>5. Nimzo 7.32 (2686 - 2621)
>>6. Junior 5.0 (2661 - 2593)
>>6. Hiarcs 7.32 (2661 - 2592)
>>8. Crafty 17.07/CB (2670 - 2580)
>>9. Nimzo 99 (2663 - 2584)
>>
>>From this it is clear that even with a single standard deviation, crafty may be
>>within a few ELO points of Fritz, and could even be stronger than all of the
>>programs except Fritz.  This gives a certainty of only about 2/3, even at that
>>because we are talking about a single standard deviation.  If we allow two
>>standard deviations, Crafty could easily be the strongest program (or the
>>weakest -- as you can see very easily, the uncertainty is great and very little
>>separates the top programs).
>>
>>>>The point is that Crafty is
>>>>>still behind the top commercial programs.
>>
>>There is no evidence to support this in the SSDF.
>
>************
>The evidence is there.  YOU chose to ignore it because it does not support your
>opinion.  It's early and more games are needed to solidify Crafty's place on the
>list but the early signs are very much in line with my test.

Please describe in your own words how the SSDF list supports your assertion.

>************
>
>>>>>Because they only come out once each
>>>>>year, Crafty appears to be catching up more quickly than it really is.
>>
>>Upon what evidence do you purport this vacuuous argument?  Some real data
>>please.
>
>*******************
>I have already said, I have a database of over 11,000 games now and it is still
>growing.  Last June I archived about 7000 games and started a new database which
>now contains over 4000 games.  Of these games Crafty has played about 700 vs the
>top commercial programs.  Crafty does _n_o_t_ have a winning percentage against
>any of the programs. (Fritz5.32/Fritz6/Junior5/Junior6/Hiarcs 7.32/Nimzo99)
>Nimzo99 is the weakest at blitz and Crafty may well be stronger than Nimzo99 now
>because I have not tested Nimzo99 lately.  Since I have tested many different
>versions of Crafty it is evident that it is getting stronger.  The latest
>version I have tested is Crafty 17.07 in the Chessbase GUI.  The earlier
>versions were WCrafty using Remi's autoplayer232 in a DOS window.

When you say "Crafty" were all of the games played with the same version on the
same machine under identical conditions?  Did you use the anti-computer opening
book?

>*******************
>
>>>>>The
>>>>>improvement in Fritz this time was really great.  Crafty is still behind but
>>>>>with Bob working overtime, it is slowly closing the gap.  One thing for sure,
>>>>>you can't complain about the price of Crafty.  I really like Crafty in the
>>>>>Chessbase interface.  It seems that all the problems have been fixed as far as
>>>>>Chessbase/Crafty goes.  Crafty seems to score better using the Fritz book too.
>>>>
>>>>Since you have used "programs" it shows that you do not fully understand the
>>>>publication by the SSDF.  Within experimental uncertainty (which they do list),
>>>>crafty is clearly as good as many of the commercial programs.
>>>******************
>>>To tell someone that they don't understand something as simple as the SSDF list
>>>is pretty silly.  You are taking on an "Air" of a know it all.
>>
>>Well, you clearly don't understand it.  You quoted the SSDF figure to support
>>your argument and your 'explanation' shows that you don't know what the numbers
>>mean.
>
>**********
>As I said, the SSDF list and my 11,000+ game database and watching many games on
>ICC/FICS.
>I know that you are "The Great Defender" of Crafty.  I also like Crafty.  It is
>a great program.  But none of the data I have seen indicates it has caught up
>with the latest commercial programs.  If you have any I would like to hear about
>it.

The SSDF list clearly indicates that Crafty is a peer with the best commercial
programs at 40/2 on 450 MHz machines using autoplayer.

The recent CCC contest had crafty a clear winner over all programs, including
commercial ones.

Ed's Chess in 2010 had crafty as better than all the commercial programs

Crafty did well in the KKUP contests, and has also done well at WMCCC events.

>**********
>
>>>In the above
>>>case I was also talking about Blitz games.
>>
>>Yes.  You did say that you thought the difference was more than 100 points at
>>blitz.  However, I seriously doubt that you have the same type of quality
>>control that the SSDF does.  Were all the crafty games played with the same
>>version of crafty under identical conditions?
>
>***************
>Why do you doubt my statements?  Is it only because they do not support your own
>opinion?  Do you know of any time in which I lied about any of the data I have
>posted here?

I don't doubt your integrity for a moment, nor think that you have even a germ
of untruth intentionally added to your statements.  But since you don't
understand the SSDF results, I find it hard to imagine that your results are
"beter."

> Frankly I believe my quality control is better than the SSDF.
>Since I spent the last 15 years of my career as a Quality Assurance
>Representative in a Precision Measurements Laboratory, I think I have just an
>inkling of what I'm doing and also an elementary understanding of stastical
>analysis.  One of the things that I believe SSDF is doing wrong is not playing
>an equal number of games vs all opponents.  The reason I think this is because
>Program A may do well against Program B but not as well against program C even
>though program B has a higher SSDF rating than program C.  If you accept this as
>fact then you can see that if you play more games against an opponent which you
>perform better against then you can elevate your rating to a false value while
>playing less games against an opponent which gives you trouble.  I'm not sure
>this is fact but in my opinion to prevent this from happening I think they
>should play an equal number of games vs all opponents.  Of course I have not
>played the same version of Crafty all the time.  I have tested new versions as
>they appear but only about every 5th version.  I have tested all versions using
>Hash=48M,Hashp=16M,Cache=10M when applicable.  It is my understanding that these
>values are acceptable for Crafty.  I have played all commercial programs using
>default settings except hash settings which have been consistent for each
>program.

If you could be so kind as to send me your data, I would like to analyze them
myself.  I am ready to concede that I am wrong if a careful statistical analysis
shows that I am mistaken.

>****************
>
>>>I have more than 11,000 games in my
>>>database now played between some of the top programS.  I am making my statement
>>>based also on these games which I have watched.  I believe my certainty about
>>>Crafty's rating is slightly better than just the SSDF list alone.
>>
>>I doubt very much if this is true.  However, I will admit is is possible that
>>you are correct.  I suspect that you cannot defend your position mathematically.
>
>************
>Again you are expressing doubt but you have nothing to base _y_o_u_r_ doubt on.
>I simply am not supporting your opinion and that makes me doubtful.  How many
>games have you played with Crafty vs top commercial programs?

Only a very few, and I have not run any statistics on those games.  I rely upon
the experimental results of others that I have seen for this sort of thing.  I
generally do formal tests only of the freely available engines.

>************
>
>>>I have also
>>>watched Crafty play on ICC/FICS using the quad vs Fritz/Junior.  In my opinion
>>>Crafty is still behind the top programS.  I believe I'm entitled to my opinion
>>>which is based on my experience.
>>
>>Of course.  All opinions are equal.  Mine is no better than yours.  However,
>>mathematics can put more weight upon our arguements than feelings can.
>
>*************
>For a quick education, go to ICC and do a search on "TheComputer" vs "Crafty".
>See if you can tell anything from the "Trends".  Guess when "TheComputer"
>started using Fritz6 if you can.  I'll bet it's pretty easy.  But of course none
>of this supports your opinion.

To try and make this sort of comparison is laughable.  Where are the controlled
conditions of a scientific experiment?

> By the way I believe that "TheComputer" is
>running on a K6-3-450.
>*************
>
>
>>> The list below has nothing to do with my
>>>statement concerning Crafty's strength relative to the top programS so I will
>>>just ignore it as irrelevant to the topic of discussion.
>>[snipped]

I apologize for my tone and for the fact that I have come across harshly.

Your statement "The point is that Crafty is still behind the top commercial
programs."

Is simply untrue.  Now, behind 'in features' I will heartily agree with.  In
fact my lists showed that to be obvious and self-evident.  It may be that there
are some conditions for which crafty cannot compete with commercial programs on
strength.  Whatever set of conditions that is will probably be rather
uninteresting to me, but I would still like to know about it.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.