Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: how much does eval effect strength?

Author: Tim Foden
Date: 06:08:13 09/18/03
On September 18, 2003 at 08:45:28, Michael Yee wrote:

>[snip]
>
>>I did an experiment once... I made a version of GLC 2.18 that used only material
>>balance in its evaluation.
>>
>>It played a 20 game match against GLC 2.13.  During the match 2.18 averaged 2 to
>>4 ply more than 2.13 (mainly due to extra cut-offs, the NPS search speed was
>>only about 20% faster).
>>
>>The result... 2.13 won, by 19.5 points to 0.5 points.  This gives roughly a 640
>>ELO rating difference.
>>
>>Normally I would expect 2.18 to be about 50 ELO stronger than 2.13.
>>
>>So loosing the 'positional' part of the eval function seems to have caused
>>around a 700 ELO drop in strength.
>>
>>Conclusion: the evaluation is important.  :)
>>
>
>[snip]
>
>Very interesting experiment!

I thought so too when I saw the result.

The games were interesting.  2.18 made moves which were basically aimless, thus
it was wasting time all the time.  2.13 slowly built up a stronger position.
2.18 saw the tactical loss before 2.13 saw the tactical win.  By the time this
happened, 2.18 was in too big a hole to get out of it.  The single draw was
lucky, as it was to save another loss by 2.18.

>The result isn't exactly what de la Maza predicted
>in his "400 Points in 400 Days" article on ChessCafe:
>
>   You can refine this experiment further by creating two personalities,
>   one that can see three moves ahead but has no positional knowledge and
>   the other that can see two moves ahead and has complete positional
>   knowledge. The tactical personality, which can see three moves ahead,
>   will win the vast majority of the games.
>
>   [http://www.chesscafe.com/text/skittles148.pdf]

Interesting, but I think not backed up by any actual experiments.  If the
positional personality still had it's quiescence search, I would find it
difficult to predict the outcome.  It's an easy experiment to make though.

>
>I wonder how much of the 700 ELO points gained in GLC due to positional
>knowledge came from pawn structure, king safety, mobility, etc...

Yes, you could do the same experiments with various bits of the evaluation on
their own, and see what contribution they make.

Here are my notes from the different versions of 2.18 and the tests I made at
the time:

=========================================================================

2.18k (in CVS as 2.18j)

Has evaluation with only material eval.

Played 20 game match verses 2.13, 3+3.

Result: win 0, lose 19, draw 1.  Score: 0.5/20 (2.5%).

Average 1,300,000 nps. (2.13 was about 980,000 nps)
Average search depth about +2 (0 .. 4) ply relative to 2.13... so looks like the
quantization was a factor.

~~~~~~~~~~~~~~~~

2.18m

Has evaluation with only material eval and piece/square tables.  Lazy evaluation
is not used.

Played 20 game match verses 2.13, 3+3.

Result: win 1, lose 16, draw 3.	 Score: 2.5/20 (12.5%)

Average 950,000 nps.
Average search depth about -0.5 (-1 .. 0) ply relative to 2.13, rarely searched
deeper than 2.13, quite often 2.13 searched deeper, particularly in the endgame.

~~~~~~~~~~~~~~~~

2.18n

Has evaluation with only material eval and piece/square tables.  Quantizes the
eval using "eval = (eval + 0x20) & ~0x3F;" to get to the nearest 0.064 of a
pawn.  Lazy evaluation is not used.

Played 20 game match verses 2.13, 3+3.

Result: win 2, lose 12, draw 6.  Score: 5/20 (25%)

Average 945,000 nps.
Average search depth about +0 (-1 .. +1) ply relative to 2.13.

~~~~~~~~~~~~~~~~

2.18o

Has evaluation with only material eval and piece/square tables.  Uses lazy
evaluation with fixed +/-2 pawns boundary.

Played 20 game match verses 2.13, 3+3.

Result: win 4, lose 8, draw 8.  Score 8/20 (40%)

Average 1,100,000 nps.  (2.13 was about 960,000 nps)
Average search depth about +0.5 (-1 .. +2) ply relative to 2.13.

~~~~~~~~~~~~~~~~

2.18p

Has evaluation with only material eval, piece/square tables and pawn structure
evaluation.  Uses lazy evaluation with fixed +/-2 pawns boundary.

Played 20 game match verses 2.13, 3+3.

Result: win 6, lose 8, draw 6.  Score 9/20 (45%)

Average 1,150,000 nps.  (2.13 was 1,030,000)  (PC was recently re-booted)
Average search depth about +0 ply relative to 2.13.

~~~~~~~~~~~~~~~~

2.18q

Has evaluation with only material eval, piece/square tables and pawn structure
evaluation.  Uses lazy evaluation with fixed +/-2 pawns boundary.  Quantizes the
eval using "eval = (eval + 0x20) & ~0x3F;" to get to the nearest 0.064 of a
pawn.

Played 20 game match verses 2.13, 3+3.

Result: win 4, lose 27, draw 19.  Score 13.5/50 (27%)

Average 1,130,000 nps.  (2.13 was 1,000,000)  (PC was recently re-booted)
Average search depth about ?? ply relative to 2.13.

>
>Michael

Cheers, Tim.
Thanks for these detailed tests! (nt) Michael Yee 13:15:36 09/19/03
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.