Computer Chess Club Archives


Search

Terms

Messages

Subject: Statistics, computer evaluations and some trends

Author: JNoomen

Date: 03:12:50 11/19/05


Thanks for all your comments on my Fruit-Diep report. At least it has given me
some ideas for a new posting, which is given below. I want to discuss 4 topics I
wanted to write about already for some time, but I never took the time do so.
Now your reactions gave me the necessary 'boost' to sit down and write down the
stuff.

Statistics
--------------
In many database programs and chess computer GUI's there is the nice option that
it will tell you the exact statistics of the moves being played in a position,
as well as the average Elo of the players that played the moves. This gives
insight in how often a move was played, what it's success rate is and what type
of players (strong, less strong) chose the moves. When building a book such
statistics are very helpful and they can point you towards the best move in a
position. When building a book with f.e. the ChessBase GUI, using 200.000 GM
games over 2500 Elo, you already have quite a decent book.

Still there are some topics that one should be well aware about:

1. The game statistics are between humans and not between chess programs.
2. The score of a game (0-1, 1-0 or draw) says nothing about the actual
evaluation of a move.
3. Between the opening moves and the final result there is a complete game to
watch, with its ups and downs, its mistakes and brilliant moves and even
blunders.
4. A score of 25% for a move in a certain position never means that the move
should be bad. In fact it can be the only move or the best move available.
5. When using games only, you have no moves that will punish bad lines, or
defend against unsound attacks. Because such lines are never played between
strong players.

To add to the discussion, I will give you a very clear example. There are 2
choices:

A) Move A scores 90% in 100 human games and is regarded by humans as the only
and the best move. However, your chess program dislikes it and after several
tries the score against other progs is only 35%.

B) Move B scores only 35% in 100 human games and is regarded by humans inferior
to move A. When testing this move, you notice that your program quite likes this
move and in several test games (using the same progs as under A) the score is an
impressive 75%.

Now what would you do? Choose the move about which all humans say it is best
(but gives a clearly inferior score for your program), or forget about the human
statistics and use your own statistics, which tell you the program likes it and
scores well with it?

To put it in other words: IMO the use of statistics in human games is quite
useful in building a book, but don't stare blind at it.

Computer evaluations
-------------------------------
Today the present programs are so strong, that in a normal game they can beat
the top GM's. A clear trend is to put the program in 'agressive mode' and let it
furiously attack the opponent king. To come to this point, many computer
programs have so called 'optimistic' or 'risky' evaluations. Noticing test
games, I have seen many times that progs don't give a good eval of the position
(If I find examples, I will post them). Tactically they are 3000+ Elo, but in
many positions their eval don't match the board position. Many progs tend to be
(over) optimistic, showing a clear plus that is not even there.

As far as I am concerned: I tend to distrust the computer evaluations, regarding
a specific position. I think that a computer eval says more about whether the
program likes the resulting position or not. But not exactly the clean and pure
eval of the position itself. For me, the only truth is the position on the
board. Not statistics, not the evals of the top programs will influence my
choice, but a very good analysis of the availabe options in the given position.

In Sicilians with opposite castling most programs tend to like Black. Even in
positions where top players agree White is better, programs show a positive
black score. To analyse this, is quite easy: programs see the open c-file, a
pawn storm and they get a bonus for that. But that is too easy to form an eval
of such a position. In the Fruit-Diep game many programs will like black.
Interestingly Diep thought white was better. To give a mere bonus for an open
file and a pawn storm will turn out successful in many games. So there is
obviously no reason to change this. But it is my view that humans can take
advantage of that: computers overrate their attack.

Fruit takes a more sophisticated approach. I like that very much. Somehow, as a
human, I'd like to see the pure and realistic eval of a position. Pro Deo/Rebel
did that too, and I liked that. There is the explanation why Pro Deo scores so
heavily in Sicilians. It doesn't believe to be worse, defends succesfully, takes
the pawns offered and launches a counter attack.

How to evaluate this position
-----------------------------------------
Another position I am very interested to share thoughts with you. Suppose you
have a slightly better ending and there is a deep line that gives the
opportunity to exchange into a R+B vs. R ending. Of course your program has
tablebases, so it will calculate a score of 0,00  for that position and refuse
to play it. Since it has a slightly better position and that is better than a
0,00 score. But will that choice maximise the winning chances of the program in
question? My simple answer is: it depends on the opponent. Let's see:

1. My opponent is a very strong computer program, using EGTB. Okay, that is
easy: it will see all drawing lines, so the score of the resulting R+B vs. R is
0,00. Go for the slightly better position.

2. My opponent is a strong computer program, that uses no EGTB. Now there is a
problem: I don't think the score for the R+B vs. R as 0,00 is correct here. It
is very likely the other prog will see a -3 disadvantage and doesn't know how to
defend correctly. Here my choice would be: go for the R+B vs. R ending. Evaluate
it is a clear advantage, well over the slight advantage in the ending.

3. My opponent is a very strong GM. I think in this case it doesn't matter much.
He will know how to defend a R+B vs. R ending, but there will always be a chance
he goes wrong. On the other hand, the slightly better ending might be easy to
draw, so it is unclear what the best option would be. In both options I would
say +0,25 is a reasonable eval.

4. My opponent is a weaker player. Still questionable, as he might lose the
slightly worse ending. However, I would say that the chance is very big he will
lose the resulting R+B vs. R position. Not knowing how to defend this position
there is a huge chance he will go wrong. So in this special case I tend to
evaluate the R+B vs. R ending higher than the slightly better ending.

A normal computer program with EGTB doesn't make such choices. R+B vs. R is a
draw. Score 0,00. But I don't think that is correct. It depends on the opponent.
So EGTB will not maximise the winning chances, as the program will always avoid
it and in situation 2 and 4 it is clearly better to go for the 'clearly drawn'
R+B vs R ending.

Trends in human thinking
------------------------------------
Watching many postings here, I see a very interesting trend: posters put a
position in their computer, let a top program calculate several lines and when
the program says +1,0, the general consensus is that the prog's side has the
better position and that somehow the other colour must have made a mistake.

My tip: it is very dangerous to do so! Just following computer variations and
evals without properly assessing a position might give you a completely false
idea about the position. There is only one way to find the truth about a
position: delve deeply into it and watch it objectively.

Jeroen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.