Author: KarinsDad
Date: 12:54:03 01/26/99
Go up one level in this thread
Since I do not run programs vs. programs as a general rule, I have a slightly different perspective. To ensure valid data, you should: 1) Have a control. You have that with the posted games. 2) Run that control versus multiple changes of a single variable (i.e. run test 1 with a single variable changed from the control, run test 2 with a different single variable changed, run test 3 ...). 3) Compare your results. In this case, you have too many shifting variables to make a completely accurate set of tests, so you must do the best you can with what you have. In order to do this, you should run every program you can find on similar (or identical if you can get one person to do it) platforms with similar hashes, similar times, etc. Try to get as close as possible (i.e. 10 minutes at 400 Mhz instead of 60 seconds, etc.). Do not just run Crafty. This would give you more information than just Bionic vs. Crafty. If Bionic and Crafty were 95% similar, but Bionic and Fritz were 93% similar, what have you proved? My gut feel is that you will not find anything which is statistically significant. But we won't know that until the tests are run (be sure to post everything which you think is pertinent such as matching criteria, speed, duration, etc.). If you broadbase your tests like this, you leave yourself less open to critism such as "but you ran it for 9 minutes, if you had run it for 11 minutes the results would change by 13% as per my tests", etc. The response then is, but I also ran Junior and Fritz at that 9 minutes and Junior matched better and Fritz matched worse. KarinsDad
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.