Author: Georg v. Zimmermann
Date: 14:23:40 09/26/99
Go up one level in this thread
I was just trying to make some changes to crafty (time-function) and before doing that I wanted to test its strength (so to be able to compare later). I didn't want to use the autoplayer (people tend to get strange results) or the cb-adaptor (where I get _very_ strange results) so I did let it play against the (IMHO) second best freeware engine "Comet" under WBoard. I did make sure they got the same HT (checked with memory tool). I did make sure they got the same processor time (checked with some system tool), of course no other processes running and restart before every match and I have a very stable system. I did delete the learning files after every match. I did let them play from the Nunn positions, one time with white, one time with black, cause I wanted to test the engine, not the opening book. I did make sure both got 4man TB acess. (btw: i'd still like to know what you think are the most usefull 5man TB) k6II/400,15min/game. Result : newest Crafty 16.9 : 15,5 newest CometB06 : 4,5 (!). Hey, I thought, this can't be: Comet isn't _that_ weak! So I did another test, with _exactly_ the same configuration, only 5 0 games. Result 9,5, : 10,5. Comet won (!). Hey, I thought, this can't be: there is too much a difference between those matches. So I did another test with the same configuration, only 14 0 (!) games. Result: 14,0 : 6,0. This is 1,5 more points for Comet only because 14 0 instead of 15 0 ! What does this teach us ? 1.) Forget about any serious testing (hello SSDF ! ;-) ) if you don't play at least 200 matches between every engine. If not you just get garbage results. 2.) We should never _ever_ say that anyone cheats if results differ that much on one computer! Even if that person does use some unwise formulations. Best regards, Tec--
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.