Computer Chess Club Archives


Search

Terms

Messages

Subject: Crafty MPC tests (long post)

Author: Robert Allgeuer

Date: 09:00:09 10/18/03



I have done relatively extensive testing of the Multi-ProbCut version of Crafty
18.15 (Crafty-mpc, executable taken from Dann Corbit´s ftp site) and the impact
of ProbCut on its playing strength. The following are my results.




Participants and Settings:
==========================

Crafty-MPC: version of Crafty 18.15 with Multi-ProbCut added, from Dann
Corbit´s ftp site; cleanbook from Dann Corbit´s ftp site. See also
http://www.cs.ualberta.ca/~mburo/ps/chessmpc.ps.gz

Crafty 18.15: standard version of Crafty 18.15; otherwise identical
configuration as above.

Crafty.rc file for both versions:
hash 96M
hashp 8M
cache 8M
drawscore=0
log off
book on
book random 1
book width 5
learn 0
resign 9
tbpath=f:\tb
egtb

Book and egtbs have been turned off for the test suite runs.

Ponder off; all 3, 4 and 5 men egtbs.

A series of free chess engines was used as opponents in the gauntlet
tournaments.




Platform and Tools:
===================

Athlon Thunderbird 1.1MHz
512 MB RAM
Windows 2000

Crafty-MPC
Crafty 18.15
Elostat 1.1b
PGN-Extract 15.0 (for eliminating duplicate games)
lgpgnver 1.1 (for checking correctness of draw claims)
Winboard 4.2.3
WB Tourney Manager 0.60 (Jori Ostrovskij)
Uci2wb 2.0 (R. Pfister)




Tests:
======

1.) Nps and Search Depth

ProbCut slows down the engine by 2-6%; in the middlegame closer to 5/6%, in the
endgame by 2 to 3%. (Nominal) search depth on the other hand is increased by
1.2 to 1.9 plies.



2.) Gauntlet Tournament at Blitz time control (300+2) against all opponents:

Crafty-MPC v18.15DC playing 20 games against each other engine. Score: 321 / 540
(59%)

Rank|No  |Name                     |                    |Pts               |
----|----|-------------------------|--------------------|------------------|
  1.|  2.|Crafty v17.14DC          |====11111=111=11=111|    16.5 / 20  82%|
  2.| 10.|Little Goliath 2000 v3.9 |11110111==110===0=1=|    13.5 / 20  67%|
  3.| 11.|Green Light Chess v3.00  |100110001=1==1101111|    12.5 / 20  62%|
  4.|  9.|Crafty v18.15DC          |=0======1=11=10=====|    11.0 / 20  55%|
  5.| 15.|LambChop v10.99          |00101011=0011111=00=|    10.5 / 20  52%|
  6.|  1.|Ruffian v1.0.1           |110000=1101===1=0=10|    10.0 / 20  50%|
  7.|  8.|Delfi v4.2               |===0111==0===0001110|    10.0 / 20  50%|
  8.| 19.|Comet B44-2              |0111=100=1=001=1=0=0|    10.0 / 20  50%|
  9.| 18.|Tao v5.4                 |0110001==00110=10110|     9.5 / 20  47%|
 10.| 20.|Amy v0.8.3               |==00100111001=1=010=|     9.5 / 20  47%|
 11.|  3.|Aristarch v4.21          |=0110==1=10=010=1000|     9.0 / 20  45%|
 12.|  6.|SoS 3                    |010100=1101=10001==0|     9.0 / 20  45%|
 13.| 12.|Pharaon v2.62            |0100011=0=0=010101=1|     9.0 / 20  45%|
 14.| 14.|Ktulu v3.9               |100=0101000=01==1001|     8.0 / 20  40%|
 15.| 13.|Crafty v19.01DC          |=0=000==0=11==0=01=0|     7.5 / 20  37%|
 16.| 22.|Comet B60                |01=00000=1=========0|     7.5 / 20  37%|
 17.|  7.|Pepito v1.59 profile     |1=1=000=0000000=1011|     7.0 / 20  35%|
 18.|  4.|Yace Paderborn           |011000101=000100=000|     6.0 / 20  30%|
 19.|  5.|SmarThink v0.16b++       |0=0=00=010=10=000=01|     6.0 / 20  30%|
 20.| 24.|Leila v0.53h             |1===0010000=00=00==0|     5.5 / 20  27%|
 21.| 27.|Beowulf v2.2             |=01=00=1=1000000000=|     5.5 / 20  27%|
 22.| 16.|Gromit v3.8.2            |10000=0=00=00==0=100|     5.0 / 20  25%|
 23.| 17.|Anmon v5.22              |==00000=000==0=1==00|     5.0 / 20  25%|
 24.| 21.|PostModernist v1.007     |00==000=001=00000101|     5.0 / 20  25%|
 25.| 23.|Francesca M.0.0.9        |010000010=100=000000|     4.0 / 20  20%|
 26.| 25.|Tcb v0045                |000=01=0=00100000000|     3.5 / 20  17%|
 27.| 26.|SlowChess v2.78          |00000101=00010000000|     3.5 / 20  17%|


The MPC version performed rather inconsistently: While reaching very good
scores against e.g.Ruffian, Yace, Aristarch and Smarthink, it performed
significantly worse than normal Crafty 18.15 against some of the other engines
(e.g. Crafty 17.14, Little Goliath, GLC, Comet and LambChop). Crafty 17.14
really took Crafty-MPC apart. The MPC version also lost the direct comparison
with Crafty 18.15, although this match was tight.
The performance of Crafty 18.15 generally was more consistent and predictable
(i.e. low scores against the best engines and higher scores against weaker
engines).


Overall this results in following ratings (after elimination of duplicate
games):

    Program                    Elo    +   -   Games   Score   Av.Op.  Draws

    Crafty-MPC v18.15DC      : 2563   26  27   531    59.8 %   2495   26.6 %
    Crafty v18.15DC          : 2550   26  24   547    57.5 %   2497   29.1 %

This indicates that the measured overall gain in strength due to Multi-ProbCut
is within the error margins and statistically not significant.



3.) Matches at longer time controls:

In order to get an indication, whether ProbCut maybe needs longer time controls
to show its real advantages, a series of matches has been carried out at time
controls exactly 4 times and 16 times as long (i.e. 1200+8 and 4800+32).
For 1200+8 four engines were chosen: Ruffian 1.0.1, Crafty 18.15 and the two
engines closest to the Blitz rating of Crafty-MPC: Delfi 4.2 and Aristarch 4.21:

Crafty-MPC v18.15DC playing 20 games against each other engine. Score: 40 / 80
(50%)

Rank|No  |Name               |                    |Pts             |
----|----|-------------------|--------------------|----------------|
  1.|  2.|Ruffian v1.0.1     |=011011=011010010=1=|  11.0 / 20  55%|
  2.|  3.|Aristarch v4.21    |===1=0===01=01==110=|  10.5 / 20  52%|
  3.|  4.|Delfi v4.2         |110101=01=0010010101|  10.0 / 20  50%|
  4.|  5.|Crafty v18.15DC    |10==0======010=10==0|   8.5 / 20  42%|


Due to time restrictions only one match of 20 games against Ruffian was played
with the time control of 4800+16:

Crafty-MPC v18.15DC playing 20 games against each other engine. Score: 7 / 20
(35%)

Rank|No  |Name               |                    |Pts             |
----|----|-------------------|--------------------|----------------|
  1.|  2.|Ruffian v1.0.1     |0111=====11=0=11=101|  13.0 / 20  65%|


These games at longer time controls are certainly not sufficient to draw a
final conclusion, but it appears unlikely that the Multi-ProbCut version
significantly increases in playing strength with longer time controls. On the
contrary, it scored lower against Ruffian and Aristarch than in Blitz, and
only equal against "Blitz-specialist" Delfi, improving only against Crafty
18.15.



4.) Testsuite Runs:

Four testsuites were run: LCT-II (mixture), gelfer (positional), wcsac
(tactical) and speelman (endgame):


  Program               lct-2          gelfer          wcsac          speelman
                       600 sec         20 sec          20 sec          20 sec
  Crafty 18.15          2620            113             859             112
  Crafty MPC            2410            113             808             110


Crafty-MPC is significantly weaker in tactics than Crafty 18.15 and in fact
than any other chess engine I have come across so far. The wcsac score is the
lowest score I have ever obtained (lowest until now was Gerbil 02 with 822);
the low LCT-II score (for comparison: this is at about same level as ExChess)
is almost exclusively due to the very weak performance in the tactical portion
of the LCT-II test suite.




Conclusion:
===========

My tests indicate that the overall playing strength of Crafty 18.15 remains
more or less unchanged by the addition of Multi-ProbCut. However, the
characteristic of the engine changes significantly due to ProbCut: Even though
nominal search depth is increased by one to two plies, tactical strength is
severely reduced.
Furthermore with ProbCut match results become more unpredictable and
inconsistent: Apparently there are types of opponents against which ProbCut
works very well and results in significantly improved results, but there are
also other opponents (the tactically stronger ones?) where ProbCut has exactly
the opposite effect.


Robert



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.