Author: Albert Silver
Date: 09:48:03 10/02/04
Go up one level in this thread
On October 02, 2004 at 06:10:03, Andrey Popov wrote:
>Tourmaments are used to measure improvements of engines. However, they have some
>disadvantages.
>Firstly, it is a great and hard task to hold a big repeating tournament.
>Secondly, an improved version can take a lower place or fall into a lower
>division due to improvements of its opponents. This phenomenon disappoints
>authors and misinforms other people. Thirdly, it is impossible to use various
>versions of the same engine in one tourney because of many games.
>Some peoples arranged gauntlets. They took, say, 30 engines and hold a
>round-robin tourney. Later they add the 31th engine which plays with other 30.
>However, after, say, 200 engines each in 5 versions, they must do 1000 matches
>in order to add only one engine. Therefore they put up this hobby.
>
>I suggest another method. Probably someone already suggested it, but I doubt.
>You can choose, say, 20 (or 30) stable engines (strong and medium).
>Engines which have not free updates (e.g. Phalanx XXII, Ruffian 1.05, Pepito
>1.59) and well-known old bug-free versions (e.g. Crafty 17.14, 19.03,
>LG2000v3.5) are preferred.
>New engines and new versions play 40 (or 60) games only with these examiners.
>They do not play each other. An examined version can be erased from your hard
>disk immediately. The tournament director can choose only interesting versions
>for testing. He can make long intervals between testings because there is no
>shedule for this kind of tournaments. Every new engine needs only 40 (or 60)
>games.
>The conditions can be 5min+1s, no learning, no books(?) or something else.
>Of course, a tournament director with a computer needed :).
>What do you think about it?
First of all, I presume you're talking about free engines, because you mention
the constant changes and improvements, creating unstable engine ratings, yet the
commercial engines are tested by the SSDF and have very stable ratings. Except
for those with few games such as Ruffian 2.0.
Personally, I like to run Nunn gauntlets as they remove the issue of book
randomness, and test against known results. For example, I know how Pro Deo
scores against Junior 8 or Hiarcs 9, thanks also to Kurt's testing, and can then
compare with personal settings.
As to the time control, I don't think 5 minute games are a good test unless
you're only really interested in knowing how well an engine does in blitz. There
are engines such as Ruffian, that do proportionately better in blitz than slow
time controls (not that it is weak by any means at slow time controls). My
preferred time control is 30mins+5s.
Albert
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.