Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Final Scores + they're back!!

Author: pavel

Date: 21:40:26 02/01/02

Go up one level in this thread


On February 02, 2002 at 00:19:03, Dann Corbit wrote:

>On February 01, 2002 at 23:09:05, pavel wrote:
>>On February 01, 2002 at 22:45:55, Dann Corbit wrote:
>>>On February 01, 2002 at 22:35:20, pavel wrote:
>>>>On February 01, 2002 at 22:08:56, Tina Long wrote:
>>>>>On February 01, 2002 at 21:39:41, pavel wrote:
>>>>>>crafty-1 home  2002
>>>>>>
>>>>>>
>>>>>>1   Crafty 17.14  2500   +85  34.5/56
>>>>>>2   Crafty 17.11  2500   +72  33.5/56
>>>>>>3   Crafty 18.11  2500   +65  33.0/56  918.00
>>>>>>4   Crafty 18.12  2500   +65  33.0/56  887.00
>>>>>>5   Crafty 18.01  2500   +58  32.5/56
>>>>>>6   Crafty 17.02  2500   +45  31.5/56  847.75
>>>>>>7   Crafty 18.13  2500   +45  31.5/56  847.50
>>>>>>8   Crafty 18.10  2500   +45  31.5/56  846.00
>>>>>>9   Crafty 17.01  2500   +39  31.0/56  855.50
>>>>>>10  Crafty 18.03  2500   +39  31.0/56  849.75
>>>>>....
>>>>>
>>>>>Keeping in mind the "worth" of these results:
>>>>>
>>>>>Good work Pavel, a very interesting tournament & I hope Jonas does follow up on
>>>>>this (I would suggest he use your rankings 1,2,4,5,6,7, against whatever he
>>>>>likes - JMO)
>>>>>
>>>>>I wonder now, when I quoted 17.16 as the best recently, if I had recalled
>>>>>wrongly & meant 17.14 (particularly as 17.16 doesn't exist)
>>>>>
>>>>>Anyway, Until I see Jonas' results I'm moving my 18.13 aside & installing 17.14.
>>>>>
>>>>>Incidently: http://www.chessbase.de./download/index.asp?cat=Engines
>>>>>
>>>>>"Search in this category"
>>>>>Search for Crafty, & http://www.chessbase.de./download/searchresult.asp
>>>>>gives all the Comets Craftys & the Bam Bam.
>>>>>
>>>>>Keep it up & please keep us informed,
>>>>>
>>>>>Tina
>>>>
>>>>
>>>>I don't know for sure, how much you can trust this score, but it can't be way
>>>>off-hand since this has been discussed (or tested) several times before that
>>>>these versions (17.14/17.11) are strong(er).
>>>>
>>>>Anyways, I am interested on Jonas test because, it will be on differant
>>>>platform, with ponder=on,and differant time control and with differant enignes.
>>>>
>>>>Summing it all up, it should be interesting for crafty fans. :)
>>>>
>>>>But again (if I know people over here well enough), someone will come up with
>>>>tossing coins and try to prove that he is a good statistician (and riduculously
>>>>relating that to chess), other will come with "ifs" and "buts" and "mores" and
>>>>"duhs" ;).
>>>>
>>>>Oh, and ofcourse another good conclution could be that the latest crafty
>>>>versions are tweaked for humans and does not play as good against comps....
>>>
>>>The number 1 entrant in the best crafty sweepstakes:
>>>>>>1   Crafty 17.14  2500   +85  34.5/56
>>>
>>>The number 7 entrant in the best crafty sweepstakes (current version):
>>>>>>7   Crafty 18.13  2500   +45  31.5/56  847.50
>>>
>>>Notice that #7 scored 31.5/56 and number 1 scored 34.5/56.  There is absolutely
>>>no statistical significance to that result.  A whopping 3 more points in 56
>>>games.
>>>
>>>When programs are evenly matched, that is (paradoxically) the hardest situation
>>>to discern which one really is stronger.  It would take hundreds of thousands of
>>>games to be fairly certain.  It would take at least one thousand games to even
>>>have a good idea which is stronger.
>>>
>>>But if it enhances the feeling of security, pick which ever one you like best.
>>>Just be aware that there is no logical reason of one choice over another from
>>>the data on this list.
>>>
>>>Please think back to the Junior Fritz match.  What if the see-saw battle were
>>>cut off early?  A large amount of variation is not unusual.
>>
>>
>>I agree that a "decent" number of games is necessary to have a "general" idea,
>>btu I don't believe one needs to play 1000s games to draw any kind of
>>conclution. I have played some 1000 games matches myself, and I posted them on
>>winboard forum, if you remember. It was between yace against other engines
>>(gandalf I think), from those 1000 games experiences (several) i can tell you
>>that, it didnt need me that many games to come to conclution that yace will be
>>better, becasue after 500 games, virtually almost nothing changed as far as the
>>score is concerned, or the differance is so small, its too dubious to take it
>>into account.
>>
>>IMO at some point it is necessary to draw a line. I don't need to play
>>bizillions of games between Fritz7 and Junior7 to come to conclution that Fritz7
>>is better than Junior7, because this is a fact and has been proved on almost all
>>account of tournaments, matches played by so many members on this forum. But
>>naturally if you play 10 games and come into conclution that program x is better
>>than program y, (when it is well known the strength differance between these
>>programs are not much), it is perhaps not acceptable. there is no need to toss
>>coins around to find that out, it is common sense.
>>
>>Another aspect to consider is that, IMO statistics perhaps doesn't relate to
>>chess games, because there can be several reasons for differant kind of results
>>in chess matches (tournaments), ie, bad opening lines, operator mistake, slow
>>computer, program bug, OS, insufficiant memory and so on....
>
>Statistics relate to everything.  To whether the light comes on when I turn the
>switch.  To whether my car starts.  To whether I get into an accident on the way
>to work.  Everything.

Ok I perhaps didn't say it the way I wanted to.
What i wanted to say is that chess games cannot be related to tosing coins, (no
I didn't mean stats, but tossing coin is what I meant), becase while tossing
coins there are 2 probabilities, and there is nothing associated with it for a
certain probability to take place, (why would there be more heads then tails?)
While is chess this is not the case.



>
>>I can assure you that majority of the people around here can generate results
>>comparable (almost) to SSDF, and they don't run on two computer most of the
>>times, nor they play on the same time control, but still they replicate the
>>results almost always.
>
>If one program is far stronger than the other, you can find out pretty quickly.
>If they are of the same strength, it is a fairly random walk, and takes a very
>long sequence of trials to determine the outcome.
>
>>IMO it's not always about tossing coins, (not being as inetelling (?) as most of
>>you guys I might be wrong), coz there are many variables related to it.
>>
>>sooner or later its the same highway.
>
>However, our intuition leads us astray quite often.

Chess games and tossing coins just doesn't seem right.

pavs



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.