Author: Kurt Utzinger
Date: 10:36:06 01/24/03
Go up one level in this thread
Quite a huge number of games are needed to say something concrete about playing strength between two programs. In any case, a result of 8.0-2.0 does not mean anything. In this respect I would like to recall my experiment of a match over 100 rounds at time control 40'/40: Gandalf 4.32g vs Program X [I was Beta tester for X] Games 1-10 3.0-7.0 [win program X] Total 3.0-7.0 for program X Games 11-20 6.5-3.5 [win Gandalf] Total 9.5-10.5 for program X Games 21-30 5.0-5.0 [draw] Total 14.5-15.5 for program X Games 31-40 3.5-6.5 [win program X] Total 18.0-22.0 for program X Games 41-50 4.5-5.5 [win program X] Total 22.5-27.5 for program X Games 51-60 3.0-7.0 [win program X Total 25.5-34.5 for program X Games 61-70 5.0-5.0 [draw] Total 30.5-39.5 for program X Games 71-80 8.0-2.0 [win Gandalf] Total 38.5-41.5 for program X Games 81-90 7.0-3.0 [win Gandalf] Total 45.5-44.5 for Gandalf Games 91-100 5.5-4.5 [win Gandalf] Final match result 51.0-49.0 for Gandalf Enough statistics now. But the intention of this all seemed very important to me in order to demonstrate that one needs a lot of games and a lot of time to say for sure with a given reliability that program A is better than program B. And nowadays things have become more complicated due to the influence of opening books, learning functions, tablesbases, hashtables and so on. Kurt Homepage "Kurt & Rolf Chess": http://www.beepworld.de/members39/utzinger
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.