Author: Dann Corbit
Date: 21:19:03 02/01/02
Go up one level in this thread
On February 01, 2002 at 23:09:05, pavel wrote: >On February 01, 2002 at 22:45:55, Dann Corbit wrote: >>On February 01, 2002 at 22:35:20, pavel wrote: >>>On February 01, 2002 at 22:08:56, Tina Long wrote: >>>>On February 01, 2002 at 21:39:41, pavel wrote: >>>>>crafty-1 home 2002 >>>>> >>>>> >>>>>1 Crafty 17.14 2500 +85 34.5/56 >>>>>2 Crafty 17.11 2500 +72 33.5/56 >>>>>3 Crafty 18.11 2500 +65 33.0/56 918.00 >>>>>4 Crafty 18.12 2500 +65 33.0/56 887.00 >>>>>5 Crafty 18.01 2500 +58 32.5/56 >>>>>6 Crafty 17.02 2500 +45 31.5/56 847.75 >>>>>7 Crafty 18.13 2500 +45 31.5/56 847.50 >>>>>8 Crafty 18.10 2500 +45 31.5/56 846.00 >>>>>9 Crafty 17.01 2500 +39 31.0/56 855.50 >>>>>10 Crafty 18.03 2500 +39 31.0/56 849.75 >>>>.... >>>> >>>>Keeping in mind the "worth" of these results: >>>> >>>>Good work Pavel, a very interesting tournament & I hope Jonas does follow up on >>>>this (I would suggest he use your rankings 1,2,4,5,6,7, against whatever he >>>>likes - JMO) >>>> >>>>I wonder now, when I quoted 17.16 as the best recently, if I had recalled >>>>wrongly & meant 17.14 (particularly as 17.16 doesn't exist) >>>> >>>>Anyway, Until I see Jonas' results I'm moving my 18.13 aside & installing 17.14. >>>> >>>>Incidently: http://www.chessbase.de./download/index.asp?cat=Engines >>>> >>>>"Search in this category" >>>>Search for Crafty, & http://www.chessbase.de./download/searchresult.asp >>>>gives all the Comets Craftys & the Bam Bam. >>>> >>>>Keep it up & please keep us informed, >>>> >>>>Tina >>> >>> >>>I don't know for sure, how much you can trust this score, but it can't be way >>>off-hand since this has been discussed (or tested) several times before that >>>these versions (17.14/17.11) are strong(er). >>> >>>Anyways, I am interested on Jonas test because, it will be on differant >>>platform, with ponder=on,and differant time control and with differant enignes. >>> >>>Summing it all up, it should be interesting for crafty fans. :) >>> >>>But again (if I know people over here well enough), someone will come up with >>>tossing coins and try to prove that he is a good statistician (and riduculously >>>relating that to chess), other will come with "ifs" and "buts" and "mores" and >>>"duhs" ;). >>> >>>Oh, and ofcourse another good conclution could be that the latest crafty >>>versions are tweaked for humans and does not play as good against comps.... >> >>The number 1 entrant in the best crafty sweepstakes: >>>>>1 Crafty 17.14 2500 +85 34.5/56 >> >>The number 7 entrant in the best crafty sweepstakes (current version): >>>>>7 Crafty 18.13 2500 +45 31.5/56 847.50 >> >>Notice that #7 scored 31.5/56 and number 1 scored 34.5/56. There is absolutely >>no statistical significance to that result. A whopping 3 more points in 56 >>games. >> >>When programs are evenly matched, that is (paradoxically) the hardest situation >>to discern which one really is stronger. It would take hundreds of thousands of >>games to be fairly certain. It would take at least one thousand games to even >>have a good idea which is stronger. >> >>But if it enhances the feeling of security, pick which ever one you like best. >>Just be aware that there is no logical reason of one choice over another from >>the data on this list. >> >>Please think back to the Junior Fritz match. What if the see-saw battle were >>cut off early? A large amount of variation is not unusual. > > >I agree that a "decent" number of games is necessary to have a "general" idea, >btu I don't believe one needs to play 1000s games to draw any kind of >conclution. I have played some 1000 games matches myself, and I posted them on >winboard forum, if you remember. It was between yace against other engines >(gandalf I think), from those 1000 games experiences (several) i can tell you >that, it didnt need me that many games to come to conclution that yace will be >better, becasue after 500 games, virtually almost nothing changed as far as the >score is concerned, or the differance is so small, its too dubious to take it >into account. > >IMO at some point it is necessary to draw a line. I don't need to play >bizillions of games between Fritz7 and Junior7 to come to conclution that Fritz7 >is better than Junior7, because this is a fact and has been proved on almost all >account of tournaments, matches played by so many members on this forum. But >naturally if you play 10 games and come into conclution that program x is better >than program y, (when it is well known the strength differance between these >programs are not much), it is perhaps not acceptable. there is no need to toss >coins around to find that out, it is common sense. > >Another aspect to consider is that, IMO statistics perhaps doesn't relate to >chess games, because there can be several reasons for differant kind of results >in chess matches (tournaments), ie, bad opening lines, operator mistake, slow >computer, program bug, OS, insufficiant memory and so on.... Statistics relate to everything. To whether the light comes on when I turn the switch. To whether my car starts. To whether I get into an accident on the way to work. Everything. >I can assure you that majority of the people around here can generate results >comparable (almost) to SSDF, and they don't run on two computer most of the >times, nor they play on the same time control, but still they replicate the >results almost always. If one program is far stronger than the other, you can find out pretty quickly. If they are of the same strength, it is a fairly random walk, and takes a very long sequence of trials to determine the outcome. >IMO it's not always about tossing coins, (not being as inetelling (?) as most of >you guys I might be wrong), coz there are many variables related to it. > >sooner or later its the same highway. However, our intuition leads us astray quite often.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.