Author: David Dahlem
Date: 12:17:47 06/20/04
Go up one level in this thread
On June 20, 2004 at 13:42:59, Steve Glanzfeld wrote: >On June 20, 2004 at 12:10:57, David Dahlem wrote: > >>On June 19, 2004 at 12:02:19, Steve Glanzfeld wrote: > >>>Users are interested in the engine's performances in such tests, simply. >>>Assingning ratings certainly isn't the main thing. Most often it is sufficient >>>to count and compare the number of solutions. > >>I've seen numerous examples of one engine solving a test suite position in a few >>seconds, while another engine of known equal game playing strength never finds >>the solution, even after hours of analysis. To me, this makes test suites >>worthless, or at least very difficult to interpret the results. > >As long as your test suites consist of ONLY ONE SINGLE POSITION, you're right >:))) > >Steve Note that i said "numerous examples". So if, for example 10% of a test suites positions give unreliable results, how reliable are the total results? And there could be more than 10%, how is a test suite user to know? I gave an extreme example, suppose in a 20 minute per position suite, out of a group of equal playing strength engines, some engines fail to find some solutions, how is a user to know if the same principal applies? Test positions can be useful for debugging an engine, but for strength testing between engines, i think it's a waste of valuable time. :-) Regards Dave
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.