Subject: Re: Self-test and others rating stuffs...

Author: Don Dailey

Date: 09:10:09 01/02/98

On January 01, 1998 at 16:10:34, Christophe Theron wrote:

>On January 01, 1998 at 02:24:09, Don Dailey wrote:
>>I did another intersting test once.   I took a randomized database of
>>positions with master moves and noted the master responses.  I used
>>a huge sample of about 20 thousand positions.  I tested on 2, 3, 4,
>>5, etc plys just to see how often Socrates matched the master move.
>>I found a very nice smooth improvement with depth.   I thought finally,
>>maybe this is a decent way to measure improvement!  I would get 100's
>>more problems on each level jump.
>>So then I decided to turn off all the big pawn structure stuff and try
>>the test.  I self tested thoroughly to verify that pawn structure was
>>indeed a MAJOR source of strength in Socrates, it was worth perhaps
>>100 rating points or more.
>>The results at a given depth came out virtually the same!   I  was
>>completely baffled.   I didn't check into this too much further but
>>my hypothesis now is that there is no concept of "weighting" here.
>>Not playing a master move is not the same as making a horrible pawn
>>structure error and this test gives them the same weight.
>This is VERY interesting and strange. I had the same idea to take a
>bunch of master games and test the similarity between my program's moves
>for each position in each game with the master's move (only for the
>winning side maybe). But the information you give suggests that I can
>live without doing this test...
>    Christophe

Actually it would be great to get some verification from a different
source.   I would love to see the test repeated from someone else.

  Try to show:

  A) Increasing depth solve a lot more "problems"
  B) Turning off some evaluation (pawn structure) has little effect.

Before the pawn structure test took place the result was quite
and it might be very useful to figure out what happened and see if
we can come up with something better.

I can give you a database of several thousand Fen positions.  I started
with 20 thousand and threw out all moves with more than 1 response.
The positions were randomly distributed samples from one of the big

-- Don

