Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to compare with Deep Blue '97 (methodically)?

Author: Dann Corbit

Date: 19:45:10 08/02/01

Go up one level in this thread


On August 02, 2001 at 22:22:52, Uri Blass wrote:

>On August 02, 2001 at 20:30:16, Mike S. wrote:
>
>>There may be a chance to get a *rough estimation*, if a computer chess system is
>>at (or even above) Deep Blue '97 level: If somebody would be capable to distil
>>at least 10 good test positions from the 1997 match games. I can imagine that
>>this could be done, supported by the Deep Blue logs which are downloadable
>>somewhere on the net I think (I'm sure the URL is easy to find). I've heard they
>>are somewhat difficult to read though (?).
>>
>>Preferably, we should search for "single move" situations, i.e. when D.B.
>>recognised a subtle threat of Kasparov and found the clearly best defensive move
>>early, or played such a threat itself, etc. We would need to find positions,
>>which can suit as - very diffcult - test positions. The log data (hopefully)
>>shows the time D.B. needed to find those moves each. I don't expect that more
>>than 10 suitable positions can be found (if at all), which is a small number -
>>but still much better than comparing node rates or whatever.
>>
>>Then, today's chess computer systems could be tested with that, and we would
>>have at least some hard facts comparison instead of speculations. If a program
>>can find let's say 8 or 9 out of 10 after similar, sometimes better time, I'd
>>consider it is Deep Blue level. So we could compare performance... and you know
>>it, only the performance counts! :o)
>>
>>Please give your opinions if this idea makes sense, which I want to read before
>>I start searching those logs, analyzing, testing, etc. (hopefully the idea is
>>nonsense and I can save the effort :o).
>
>I think that the idea is not nonsense.
>There was no hard tactical move to find but there are positional moves to find.
>
>My suggestion is:
>1)look at all positions from the match(deep blue to move)
>or not from the match(Deep blue to ponder on moves that was not played).
>
>2)choose from these positions only the positions when Deeper blue changed it's
>mind fter more than 1 second.
>
>3)Find from these positions all the positions when all top programs converge for
>the same move that Deeper blue played when it is not trivial for them(most top
>programs cannot do it in less than 1 second).
>
>You need to give the top programs some hours for every position.
>
>You can compare the times of top programs with the time of Deeper blue after you
>find the relevant positions.
>
>Note that this experiment is biased for Deeper blue because it contains only
>positions when Deeper blue is probably right(all programs agree) but inspite of
>this fact I do not expect Deeper blue to show clear superiority in this
>experiment.
>
>It is possible to get an estimate how much it is biased by doing the same
>experiment for other programs(for example using shredder4's games against humans
>in the israeli league to estimate if it is better or worse than programs like
>Deep Fritz)
>
>
>I checked in the past something similiar to get an estimate for the strength of
>deeper blue.
>I checked the times that programs need to see similiar pv to Deeper blue in some
>positions and I found cases when Deep Fritz on PIII800 was only 2 or 3 times
>slower than Deeper blue so my impression is that Deeper blue is not better than
>deep fritz on good hardware.

I think it is absurd to try to judge the strength of a program from 100 games.
Will we do it from 100 moves?

Sometimes, programs that I work on may make a smart move.  Often -- for the
wrong reason altogether.

What happens if a move is so brilliant that nobody gets it except the machine/GM
who made it?

I just don't believe that this approach works.


On the other hand, there are a lot of people who agree with you.  I don't know
how many times I have heard someone say they know exactly how strong a program
is by simply examining the moves of one game.

Surely you will agree (at least) that any such measures are purely subjective.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.