Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Deep Blue--Part III

Author: Bruce Moreland
Date: 01:10:55 05/11/98

On May 10, 1998 at 00:52:18, Keith Ian Price wrote:

>9. One of the longest-running arguments on rgcc and CCC has been how
>well micros might fare against Deep Blue. During the Deep Blue
>excitement last year the news slipped out that there had been a match
>between DB, Jr. and Rebel 8 and Genius. DB, Jr. was supposed to have
>been slowed down to match the PC's speed somehow. I asked Hsu about this
>10-game match. He was quite familiar with the results. He confirmed that
>there had been 5 games against each opponent. He stated that there was
>only one chess processor used, and that it's clock speed had been
>halved. He also said that several pruning algorithms were turned off,
>with some selective extensions, in order to emulate the performance of
>the micro hardware as much as possible. They did this to see how well
>they did against the micros on an evaluation specific level, keeping the
>speed advantage down to the difference between what the micros could
>evaluate given their nps levels, and what could be accomplished in the
>chess specific processor evaluation, rather than how many nodes were
>searched. Since the speed of a single chess processor is about 2-2.5
>million nodes per second, and Hsu estimated that the removal of the
>algorithms caused a 5-10 times reduction in nodes searched, the probable
>nps level for DB, Jr. was somewhere between 100,000 and 250,000 with the
>clock speed reduction factored in. This is similar to the fast
>searchers, but is probably 2-5 times faster than Rebel 8 at the time. In
>any case, I asked how the games went, and Hsu pulled no punches. He said
>that the performance of the micros was much poorer than he had imagined
>they would be. He said all 10 games were basically blowouts. When I
>asked for specifics, he mentioned two examples against Rebel that had
>surprised him as to how little understanding they had of endgames and
>King safety. In the first example, the ending was with bishops of
>opposite color and normally would have been a draw. Rebel allowed an
>exchange which gave DB two widely separated passed pawns, and there was
>no way to stop both. Rebel did not realize until a few more moves that
>it was in trouble. Hsu said this was the kind of thing that is in his
>evaluation routines, and he was surprised that it was not in Rebel's.
>The second example was where DB sacrificed a Rook for a pawn next to
>Rebel's King. After the  exchange, Hsu reported, Rebel showed 2+ pawns
>advantage. DB showed a .5 pawn advantage. A couple moves later, DB went
>up to a much higher advantage, and Rebel still showed +2. After a few
>more moves, Rebel suddenly realized it was busted, and dropped its eval
>way down. Hsu thought this was due to a minimal King safety evaluation.
>He did state that even with this, he thought Rebel had a much better
>understanding of positional play than Genius did. I asked him if it were
>possible to get scores of these games. He said he did not want to
>release them, as he did not want to give out any help to future
>competitors. I mentioned that he had said the chance of Deep Blue ever
>giving another match were almost nil, and so there should not be any
>future competitors. He responded that if he got the rights to the chess
>processors, Rebel and Genius would likely be the future competitors, and
>he wanted to leave his options open. I stated that even so, once
>released, there would be thousands of games available rather quickly,
>and that these 10 would not make much difference. He said that he wasn't
>even sure if the game scores had been saved. I realized that he was not
>going to let them out, so I suggested that if he found them, not to
>erase them, as there were a lot of people interested in them, and I
>moved on.

Sorry about the big quote, but that was a big paragraph.

Imagine that you have two comparable programs, A and B.  You run them
against each other and you give A a big hardware advantage.

A will probably beat B most of the time.  What does it mean, when I say
this?  This means that there will be lots of games where A wins, and not
very many where B wins.

Attractive losses are rare.  Losses happen because one side gets into a
fatal positional bind, or gets attacked, or lets a passer slip through
in an ending, or gets crappy pawns, or gets a bad minor piece, or gets
ripped tactically, or whatever.  So, if you are the author of B, and
this happens to you, you feel like crap, because your program is playing
like crap.  It is getting cracked pawns, its king safety is proven
deficient, it is being blown out on tactics, it is losing every drawn
ending and drawing (or losing) every won ending, it doesn't understand
good and bad minor pieces, you want to throw the whole thing away and
start over, you want to die.

If you are the author of A, you are feeling pretty proud of yourself.
Everything is working out.  You are getting nice attacks, you are making
good positional judgements, you are saving the hard endings, you are
winning the drawn endings, your opponent is getting his kind stuck on g8
with his rook on h8, you get the raging satanic knight on d5, the one
that's worth about 8 pawns, everything is going great, all of your
speculative terms are producing wonderful looking wins, your program is
perfect.

So, if you have a big hardware advantage, of course the opponent's kind
safety sucks.  And of course their positional evaluation sucks.  And of
course their search is weak.  And their program doesn't know dick about
minor pieces.  Their entire program looks thin and wimpy.  They are
losing every game, of course they suck.

Of course, if you reverse the hardware gap, everything goes to hell, and
you feel like garbage.

But neither program has changed, and neither should your attitude.  But
it is very hard to convince yourself of this when you are watching your
program get beaten in game after game.

One of the reasons I don't take this DB thing at face value is that I
think it is possible that they got their hardware handicapping wrong.  I
ran a few matches on my home machines, where I gave my program a 2.5:1
hardware advantage against a professional micro, and ran at fast time
controls (5 0 and 30 0), and man, is this good for the ego.

Another is that DB is supposedly tuned to run on many processors and get
huge NPS.  I find it hard to believe that the *design* is so good that
they could back it off until it was at horsepower *equivalent* to a
micro and still beat them.  I think it is likely that the deck was
stacked in favor of DB, however unintentionally.

This is probably part of the reason they didn't want to release these
results, they weren't produced scientifically.  It's unclear if the
experiment was well controlled, we don't have the games, and nobody else
can try to reproduce the results.

We'll have to wait and see if we can ever get our hands on the thing,
but in the mean time I am more or less ignoring these results.  I don't
think there is any other sensible way to handle this.

bruce
Re: Deep Blue--Part III Fernando Villegas 08:23:11 05/11/98
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.