Computer Chess Club Archives


Search

Terms

Messages

Subject: None of these tests are truly scientific!

Author: KarinsDad

Date: 12:54:03 01/26/99

Go up one level in this thread


Since I do not run programs vs. programs as a general rule, I have a slightly
different perspective.

To ensure valid data, you should:

1) Have a control. You have that with the posted games.

2) Run that control versus multiple changes of a single variable (i.e. run test
1 with a single variable changed from the control, run test 2 with a different
single variable changed, run test 3 ...).

3) Compare your results.

In this case, you have too many shifting variables to make a completely accurate
set of tests, so you must do the best you can with what you have.

In order to do this, you should run every program you can find on similar (or
identical if you can get one person to do it) platforms with similar hashes,
similar times, etc. Try to get as close as possible (i.e. 10 minutes at 400 Mhz
instead of 60 seconds, etc.). Do not just run Crafty.

This would give you more information than just Bionic vs. Crafty.

If Bionic and Crafty were 95% similar, but Bionic and Fritz were 93% similar,
what have you proved?

My gut feel is that you will not find anything which is statistically
significant. But we won't know that until the tests are run (be sure to post
everything which you think is pertinent such as matching criteria, speed,
duration, etc.).

If you broadbase your tests like this, you leave yourself less open to critism
such as "but you ran it for 9 minutes, if you had run it for 11 minutes the
results would change by 13% as per my tests", etc. The response then is, but I
also ran Junior and Fritz at that 9 minutes and Junior matched better and Fritz
matched worse.

KarinsDad



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.