Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Mchess Pro 7.1

Author: Don Dailey

Date: 15:34:40 12/13/97

Go up one level in this thread


Hi again!

>>Let's say we get a 7 to 3 score but in fact the it's the stronger player
>>that lost this very short match.   Now you are telling me that you could
>>look at the games and see that in fact the guy who scored 3 points is
>>the stronger player?  But the games themselves indicated that the weaker
>>player in fact played the better game (for whatever reason.)
>
>Right. Exactly !! Because in the game of a chess program, different from
>a human beeing, you can SEE why it loses. A human beeing often loses for
>non-chess-reasons. A computer program has bugs too. But despite from
>bugs a computer program has many ways to misunderstand chess-concepts .
>And THESE misunderstandings can lead to a loss. And you can see these
>misunderstandings IN THE GAMES.

>>Here is a thought experiment:
>>
>>Take 2 programs,  A and B such that A is 100 rating points stronger than
>>B based on lots of evidence.   Play a series of games.  Pick out 10 wins
>>from player A, 5 white and 5 black.  Do the same for player B.   Make
>>sure the
>>games are selected completely at random.
>>
>>Do you believe you can easily tell who the better player is?   I am
>>betting
>>that you have no better than a 50 percent chance of correctly
>>identifying
>>the better player.
>
>Who has ever said the experiment has to be done THIS way ???
>Who has forced Einstein to show his cpabilities by throwing him out of
>an airplane and told him: if you are einstein, you will find a way not
>to die...
>
>Thats nonsense. Person get the programs, at his home, he needs a few
>days, let's say 30 days. In these 30 days he plays maybe 20 x 40/120
>games. And normally after his studies he knows enough about the program
>to rate it in ELO.

I made the point earlier that a 7 to 3 result was almost meaningless.
But you responded with the notion that the games themselves provided
lots of useful data.   You definitely implied that a 7 to 3 result was
adequate when you take into consideration the quality of moves also.

I never said that testing should be done that way, it was just a thought
experiment, the idea was to eliminate the results of the game from
consideration and see if you can make your quality judgements based on
the quality of the moves only.

But after rethinking the experiment, I realize it probably was not
constructed properly.   It does force you to judge the better player
without having access to the actual results (which is what I was trying
to do) but it also has the undesirable effect of eliminating more good
moves from the better player (and this would unfairly penalize your
judgement of which program is better.)

But my real point is that the proof is in the pudding.  If player A
and player B play a single game,  you might learn a whole
lot about their styles and strengths and weaknesses from looking at
the game,  but I believe you have to say the winner played the better
game, after all he won.   A human might look at the game and FEEL
like the loser played better chess but this cannot be the case.

From the humans point of view, quality is in the moves, ideas,
strategy and plans and as humans we see beauty on the chess board.
But the cold hard facts of who gets the best results consistantly
determines who is the better player.  I have known my share of
boring chess players who seem to have no imagination but some
of these players get great results.

>And than let us wait 4 or 5 months until the swedish guys have played
>THEIR needed amount of games to come to the same result (IF they don'T
>make mistakes with HOW TO OPERATE THE PROGRAMS ACCURATE :-).
>
>>  If the
>>difference
>>in strength is huge, you can probably see this with a smaller sample,
>>which
>>is exactly the way it is with normal testing.
>
>Sorry - this is against all "HIS" experience. He does not need massive
>number of games. "HE" does not need much number of games to find out
>about a good-engine and another one that is maybe 10 ELO points
>stronger.

>But Don, a game is more than the moves. I have in a normal game almost
>40 or even 80 positions. I see 80 times main-lines and evaluation and
>search-depth parallel !! I can correlate the data machine A presents and
>machine B presents and find out with correlating the data. NO MACHINE
>can correlate this as sensible like a human beeing. You need to feel !!

>If I am wrong, please tell me.

You could be right, I don't really know for sure.  Nahh, you're probably
wrong.

I definitely believe the neccessary data to measure chess strength
exists within the moves themselves.  I'm much less certain how well
PEOPLE can do this based on their own judgements.

In my experiences, humans are irrational creatures and are far too
subject to their emotions and have strong biases that cloud their
judgement.   We do have great pattern recognition facilities which
you have alluded to, but our judgement is pretty poor.



-- Don



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.