Author: pavel
Date: 20:09:05 02/01/02
Go up one level in this thread
On February 01, 2002 at 22:45:55, Dann Corbit wrote: >On February 01, 2002 at 22:35:20, pavel wrote: > >>On February 01, 2002 at 22:08:56, Tina Long wrote: >> >>>On February 01, 2002 at 21:39:41, pavel wrote: >>> >>>>crafty-1 home 2002 >>>> >>>> >>>>1 Crafty 17.14 2500 +85 34.5/56 >>>>2 Crafty 17.11 2500 +72 33.5/56 >>>>3 Crafty 18.11 2500 +65 33.0/56 918.00 >>>>4 Crafty 18.12 2500 +65 33.0/56 887.00 >>>>5 Crafty 18.01 2500 +58 32.5/56 >>>>6 Crafty 17.02 2500 +45 31.5/56 847.75 >>>>7 Crafty 18.13 2500 +45 31.5/56 847.50 >>>>8 Crafty 18.10 2500 +45 31.5/56 846.00 >>>>9 Crafty 17.01 2500 +39 31.0/56 855.50 >>>>10 Crafty 18.03 2500 +39 31.0/56 849.75 >>>.... >>> >>>Keeping in mind the "worth" of these results: >>> >>>Good work Pavel, a very interesting tournament & I hope Jonas does follow up on >>>this (I would suggest he use your rankings 1,2,4,5,6,7, against whatever he >>>likes - JMO) >>> >>>I wonder now, when I quoted 17.16 as the best recently, if I had recalled >>>wrongly & meant 17.14 (particularly as 17.16 doesn't exist) >>> >>>Anyway, Until I see Jonas' results I'm moving my 18.13 aside & installing 17.14. >>> >>>Incidently: http://www.chessbase.de./download/index.asp?cat=Engines >>> >>>"Search in this category" >>>Search for Crafty, & http://www.chessbase.de./download/searchresult.asp >>>gives all the Comets Craftys & the Bam Bam. >>> >>>Keep it up & please keep us informed, >>> >>>Tina >> >> >>I don't know for sure, how much you can trust this score, but it can't be way >>off-hand since this has been discussed (or tested) several times before that >>these versions (17.14/17.11) are strong(er). >> >>Anyways, I am interested on Jonas test because, it will be on differant >>platform, with ponder=on,and differant time control and with differant enignes. >> >>Summing it all up, it should be interesting for crafty fans. :) >> >>But again (if I know people over here well enough), someone will come up with >>tossing coins and try to prove that he is a good statistician (and riduculously >>relating that to chess), other will come with "ifs" and "buts" and "mores" and >>"duhs" ;). >> >>Oh, and ofcourse another good conclution could be that the latest crafty >>versions are tweaked for humans and does not play as good against comps.... > >The number 1 entrant in the best crafty sweepstakes: >>>>1 Crafty 17.14 2500 +85 34.5/56 > >The number 7 entrant in the best crafty sweepstakes (current version): >>>>7 Crafty 18.13 2500 +45 31.5/56 847.50 > >Notice that #7 scored 31.5/56 and number 1 scored 34.5/56. There is absolutely >no statistical significance to that result. A whopping 3 more points in 56 >games. > >When programs are evenly matched, that is (paradoxically) the hardest situation >to discern which one really is stronger. It would take hundreds of thousands of >games to be fairly certain. It would take at least one thousand games to even >have a good idea which is stronger. > >But if it enhances the feeling of security, pick which ever one you like best. >Just be aware that there is no logical reason of one choice over another from >the data on this list. > >Please think back to the Junior Fritz match. What if the see-saw battle were >cut off early? A large amount of variation is not unusual. I agree that a "decent" number of games is necessary to have a "general" idea, btu I don't believe one needs to play 1000s games to draw any kind of conclution. I have played some 1000 games matches myself, and I posted them on winboard forum, if you remember. It was between yace against other engines (gandalf I think), from those 1000 games experiences (several) i can tell you that, it didnt need me that many games to come to conclution that yace will be better, becasue after 500 games, virtually almost nothing changed as far as the score is concerned, or the differance is so small, its too dubious to take it into account. IMO at some point it is necessary to draw a line. I don't need to play bizillions of games between Fritz7 and Junior7 to come to conclution that Fritz7 is better than Junior7, because this is a fact and has been proved on almost all account of tournaments, matches played by so many members on this forum. But naturally if you play 10 games and come into conclution that program x is better than program y, (when it is well known the strength differance between these programs are not much), it is perhaps not acceptable. there is no need to toss coins around to find that out, it is common sense. Another aspect to consider is that, IMO statistics perhaps doesn't relate to chess games, because there can be several reasons for differant kind of results in chess matches (tournaments), ie, bad opening lines, operator mistake, slow computer, program bug, OS, insufficiant memory and so on.... I can assure you that majority of the people around here can generate results comparable (almost) to SSDF, and they don't run on two computer most of the times, nor they play on the same time control, but still they replicate the results almost always. IMO it's not always about tossing coins, (not being as inetelling (?) as most of you guys I might be wrong), coz there are many variables related to it. sooner or later its the same highway. pavs ;)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.