Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to setup fair and reasonable enginematches?

Author: Shaun Brewer

Date: 10:30:41 07/26/02

Go up one level in this thread


On July 26, 2002 at 11:56:10, Joachim Rang wrote:

>On July 26, 2002 at 06:15:06, Shaun Brewer wrote:
>
>>On July 25, 2002 at 14:40:06, Joachim Rang wrote:
>>
>>>I'm going to make some automatches between amateur engines. As this is the first
>>>time I try to make this, I'd be glad to get some advices.
>>>
>>>I will use a PIII-Notebook at 1.1 GHz with 256 MB RAM. I plan to make not a
>>>tournament but one to one matches with much games (a few hundreds) to get
>>>significant results.
>>>
>>>What are fair and reasonable testconditions?
>>
>>Try to reduce the number of background activities. On one machine I run Norton
>>System Works which seems to ruin results. So I don't use this for matches.
>>
>>>
>>>I will run the match on PIII 1.1 GHz and 256 MB RAM under WinXP. I think its
>>>reasonable to give every engine 80 MB Hashtablesize. I still don't know which
>>>GUI I shall use. I tested Arena and it worked, but with some problems. With
>>>Winboard I don't know how to make automatches, maybe someone can help?
>>>
>>>First engine will be Yace 0.99.56, which I consider the strongest amateurengine
>>>today.
>>>Which engine can compete with Yace?
>>>
>>
>>Crafty - I have so far found 17.10 to be stronger than 18.15
>>
>>Tests on Tbird Athlon 850 (120+2)
>>
>>Individual statistics:
>>
>>(1) wcrafty-17.10             : 500 (+166,=246,- 88), 57.8 %
>>(2) Crafty-18.15              : 500 (+ 88,=246,-166), 42.2 %
>>
>>    Program                            Score     %    Av.Op.  Elo    +   -
>>Draws
>>
>>  1 wcrafty-17.10                  : 289.0/500  57.8   2373   2427   27  19
>>49.2 %
>>  2 Crafty-18.15                   : 211.0/500  42.2   2427   2373   19  27
>>49.2 %
>>
>>Tests on Tbird Athlon 850 (300+5)
>>
>>Individual statistics:
>>
>>(1) wcrafty-17.10             : 500 (+159,=231,-110), 54.9 %
>>(2) Crafty-18.15              : 500 (+110,=231,-159), 45.1 %
>>
>>    Program                            Score     %    Av.Op.  Elo    +   -
>>Draws
>>
>>  1 wcrafty-17.10                  : 274.5/500  54.9   2383   2417   29  20
>>46.2 %
>>  2 Crafty-18.15                   : 225.5/500  45.1   2417   2383   20  29
>>46.2 %
>>
>>Tests on Tbird Athlon 850 (900+15)
>>
>>Still running something like +100 -40 =100
>>
>>Tests on PIII 1332mhz (1800+30)
>>
>>Still running (when machine not in use) too few games to be conclusive
>>
>>Individual statistics:
>>
>>(1) wcrafty-17.10             :  52 (+ 23,= 20,-  9), 63.5 %
>>(2) Crafty-18.15              :  52 (+  9,= 20,- 23), 36.5 %
>>
>>
>>>I think I will run the games with 10 minutes and 10 s per move for each engine.
>>>Is this to short?
>>
>>No correct answer to this one.
>>
>>Some engines may be poor at blitz relative to their standard performance, the
>>reverse or consistent.
>>
>>The longer the time control the better the quality of the games but a match to
>>provide statistially relevent results could take months.
>>
>>>
>>>Another question is, which opening books I shall use. Yace has an own book as
>>>most of the other engines. But I don't know how good they are, SOS for example
>>>comes only with a very little own book, which will be a handicap. Does someone
>>>know a "neutral" but large book, which will run with the most engines?
>>
>>For your experiment I would use the book provided/recomended by the author
>>initially. If you then test with a different book and this produces better
>>results share your findings with the author.
>>
>>Have fun
>>
>>Shaun
>>
>>>
>>>So far today, I'd appreciate any suggestions from you.
>
>
>Interesting results for crafty. Maybe I choose to make the first match between
>crafty 17.10 and Yace 0.99.56.
>
>Which opening book you used for your matches (don't say from Hyatts FTP) and
>which configuration file?

I started my tests between 17.10 and 18.15 because I noticed a tournament result
where 17.10 had come first including beating other 18.xx versions. I wanted to
see if it was a statistical fluke. I still think it is possible with a much
faster machine or much longer time controls that 18.15 may be better but it is
looking less likely as my experiment goes on.

I have also seen posts indicating that 17.09 and 17.14 are very strong.

Because I was testing crafty v crafty and I wanted to remove the effect of
different opening books on the match result I used a book generated from gm
games

using

book create database.pgn 50 3 50

to increase variety in the openings I used a tailored start.pgn for books and
bookc

crafty.rc settings:

computer
hash 128M
hashp 32M
cache 4M
tbpath ../TB
egtb
resign 9
ponder off

hash/hashp end up 96M and 32M respectively I have 512mb ram you will have to
reduce this avoid paging.

Shaun



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.