Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to compare with Deep Blue '97 (methodically)?

Author: Uri Blass

Date: 20:15:54 08/02/01

Go up one level in this thread


On August 02, 2001 at 22:45:10, Dann Corbit wrote:

>On August 02, 2001 at 22:22:52, Uri Blass wrote:
>
>>On August 02, 2001 at 20:30:16, Mike S. wrote:
>>
>>>There may be a chance to get a *rough estimation*, if a computer chess system is
>>>at (or even above) Deep Blue '97 level: If somebody would be capable to distil
>>>at least 10 good test positions from the 1997 match games. I can imagine that
>>>this could be done, supported by the Deep Blue logs which are downloadable
>>>somewhere on the net I think (I'm sure the URL is easy to find). I've heard they
>>>are somewhat difficult to read though (?).
>>>
>>>Preferably, we should search for "single move" situations, i.e. when D.B.
>>>recognised a subtle threat of Kasparov and found the clearly best defensive move
>>>early, or played such a threat itself, etc. We would need to find positions,
>>>which can suit as - very diffcult - test positions. The log data (hopefully)
>>>shows the time D.B. needed to find those moves each. I don't expect that more
>>>than 10 suitable positions can be found (if at all), which is a small number -
>>>but still much better than comparing node rates or whatever.
>>>
>>>Then, today's chess computer systems could be tested with that, and we would
>>>have at least some hard facts comparison instead of speculations. If a program
>>>can find let's say 8 or 9 out of 10 after similar, sometimes better time, I'd
>>>consider it is Deep Blue level. So we could compare performance... and you know
>>>it, only the performance counts! :o)
>>>
>>>Please give your opinions if this idea makes sense, which I want to read before
>>>I start searching those logs, analyzing, testing, etc. (hopefully the idea is
>>>nonsense and I can save the effort :o).
>>
>>I think that the idea is not nonsense.
>>There was no hard tactical move to find but there are positional moves to find.
>>
>>My suggestion is:
>>1)look at all positions from the match(deep blue to move)
>>or not from the match(Deep blue to ponder on moves that was not played).
>>
>>2)choose from these positions only the positions when Deeper blue changed it's
>>mind fter more than 1 second.
>>
>>3)Find from these positions all the positions when all top programs converge for
>>the same move that Deeper blue played when it is not trivial for them(most top
>>programs cannot do it in less than 1 second).
>>
>>You need to give the top programs some hours for every position.
>>
>>You can compare the times of top programs with the time of Deeper blue after you
>>find the relevant positions.
>>
>>Note that this experiment is biased for Deeper blue because it contains only
>>positions when Deeper blue is probably right(all programs agree) but inspite of
>>this fact I do not expect Deeper blue to show clear superiority in this
>>experiment.
>>
>>It is possible to get an estimate how much it is biased by doing the same
>>experiment for other programs(for example using shredder4's games against humans
>>in the israeli league to estimate if it is better or worse than programs like
>>Deep Fritz)
>>
>>
>>I checked in the past something similiar to get an estimate for the strength of
>>deeper blue.
>>I checked the times that programs need to see similiar pv to Deeper blue in some
>>positions and I found cases when Deep Fritz on PIII800 was only 2 or 3 times
>>slower than Deeper blue so my impression is that Deeper blue is not better than
>>deep fritz on good hardware.
>
>I think it is absurd to try to judge the strength of a program from 100 games.
>Will we do it from 100 moves?
>
>Sometimes, programs that I work on may make a smart move.  Often -- for the
>wrong reason altogether.
>
>What happens if a move is so brilliant that nobody gets it except the machine/GM
>who made it?

Unfortunately we have to ignore these moves and use only the moves that
everybody can find after a long time.

Everybody should include Deeper blue in order to ignore possible cases when
Deeper blue made a move that is so brilliant or simply wrong so no program can
find it.
>
>I just don't believe that this approach works.
>
>
>On the other hand, there are a lot of people who agree with you.  I don't know
>how many times I have heard someone say they know exactly how strong a program
>is by simply examining the moves of one game.
>
>Surely you will agree (at least) that any such measures are purely subjective.

I do not suggest only seeing the moves but also analyzing them by giving
programs many hours and this is the difference between me and other people.

I agree that we cannot be sure about the strength of the program by this
analysis but we can get an estimate for it.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.