Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Wanted: Deep Blue vs. today's top programs recap

Author: Robert Hyatt
Date: 11:43:34 08/27/01
On August 27, 2001 at 13:35:59, Tom Kerrigan wrote:

>On August 27, 2001 at 08:59:00, Robert Hyatt wrote:
>
>>On August 27, 2001 at 04:14:33, Tom Kerrigan wrote:
>>
>>>There are some issues here that have not received due attention.
>>>
>>>First, [as most of you already know,] part of DB's search algorithms and all of
>>>DB's evaluation function algorithms were implemented in custom VLSI chips. This
>>>made it phenominally fast and also means that it can't exist as a PC program
>>>(because you don't have the chips). However, PCs have general purpose
>>>processors, which means they can run any algorithm you can think of, so the idea
>>>of running DB on a PC isn't quite as stupid as most people seem to think, if
>>>you're talking about the algorithms. There are two issues at play when
>>>discussing implementing DB as PC software:
>>>
>>>1) Work involved. Speaking from experience, the time-consuming part of writing
>>>an evaluation function is not the actual coding, but instead deciding which
>>>terms to include and what their weights should be. If you already know _exactly_
>>>what an evaluation function is supposed to do, (and the DB team does,) I bet
>>>implementing even the most complicated one would only take a couple of weeks.
>>>Moreover, I believe that most, if not all, of DB's evaluation function already
>>>exists as software. It takes months, if not years, to design a chip of this
>>>sophistication, and more months and millions of dollars to get it manufactured.
>>>It's absurd to think that anybody would put this much effort and money into a
>>>project without first experimenting with the algorithms to see if they were any
>>>good. Additionally, it has been reported that the evaluation function was being
>>>tuned by the DB team long before they had any of their chips manufactured.
>>>Exactly what were they tuning, then, if not a software implementation?
>>
>>According to what they have published, they were tuning the "weights".  There
>>is a _huge_ difference between having code that will take weights and produce a
>>score based on a position, and having code that will do this in a chess engine.
>>The difference is the speed requirement.  If the hardware can do over 1B
>>evaluations per second, and software can do 1000, then the two are certainly
>>not comparable if they are going to be used in a chess engine.
>
>I'm glad to see that we now agree they had a software implementation of the
>evaluation function (which you categorically denied in earlier conversations).

I do not call this a "software implementation of the evaluation function."  Not
in any form.  It was a software "emulation" (for lack of a better word) that in
a very clumsy way, would look for the positional features the hardware could
recognize, and then execute the same summations that the hardware would do.  It
was not designed to be attached to a search.  It was just designed to
brute-force take positional features and treat them just like the hardware would
do, so that they could tune the weights without having the hardware tied up
doing so.

If you call that a "software implementation" I suppose you can if you want.
I like "hardware emulation in software" better.  And it could _not_ be tied
into a search engine.  Nor would anyone want to do that, because they would
prefer to re-write it wrapped around a search engine in a far more efficient
form than what was needed for weight tuning.




>And exactly what are you trying to explain with this "huge difference"? Don't
>evaluation functions in chess programs also take weights and produce scores (if
>not, what DO they do?)? So all you're saying is that they _may_ have implemented
>a function in dumbest/slowest way possible and it would be unreasonable to use
>this in a competitive chess engine? That's not a hard problem to solve. Besides,
>how were they "tuning" weights with a program that was unbearably slow?

Basically it found patterns and summed weights.  And for second-order stuff,
in found groups of patterns and multiplied/summed more weights.  Without any
regard for doing the same thing 100 times or anything about efficiency.  IE
a lot of my eval code is staged, so that I get the results from pawn scoring
before I do anything else since I need that to evaluate pieces.  And I need
piece information before I can evaluate pawns.

This "code" didn't behave like that.




>
>BTW, where did you get 1000 from? Did Hsu tell you their software implementation
>did 1000 NPS? Why even _mention_ numbers when you have no clue what they are in
>reality? Of course I sound like an idiot if "software DB" only runs at 1000 NPS,
>but I don't sound nearly as dumb if it actually runs at 100k NPS, do I? And it
>could just as easily be the latter, according to your knowledge.

I have the _code_.  It was on my web site.  It was (or still is) on Tim Mann's
web site (the eval tuner for deep thought, anyway).  As far as the 1000, I got
that from their old code, plus my experience with Cray Blitz.  Which dropped
from 7M to 1K on a PC.  And I didn't do everything _they_ did, by a long
shot...





>
>>>2) Speed. Some would have you believe that DB's evaluation function was
>>>infinitely sophisticated and had every concievable term known to man implemented
>>>in it. So many terms that they literally didn't know what to do with all of them
>>>(i.e., a lot were zeroed out). This immense complexity is supposedly made
>>>possible by the custom hardware, and any software implementation would run so
>>>slow as to be worthless. Well, I don't believe this.
>>
>>It really doesn't matter what you believe.  Hsu publicly stated that they
>>used less than 50% of the evaluation hardware in the DB 2 chips, when they
>>were preparing for the Kasparov match.  They simply didn't have time to get
>>everything tuned.  I don't see any reason to make a statement like "I don't
>>believe this."  What makes your opinion so valuable here, when you don't know
>>much at all about the team nor their product?
>
>My opinion is so valuable because I justify it starting with the next sentence.
>Can you please spare the pedantic, knee-jerk reactions to "I believe <something
>you don't>"?
>


There is no "knee-jerk".  Hsu says "XXX".  You say "I don't believe XXX".  There
is little to justify that when _you_ don't _know_.



>>> It takes effort to
>>>implement an evaluation function term, and it takes much more effort to
>>>implement one in circuitry. Common sense says that the DB team only implemented
>>>terms they thought would be useful, and not every single one they could think
>>>of. Moreover, nobody can think of terms (in DB or otherwise) that are so
>>>computationally intensive that they would cripple a PC program. MChess's
>>>evaluation function is rumored to be phenominally complex, and it can still
>>>search thousands of nodes per second and compete with the best. The only example
>>>of an awe-inspiring DB evaluation term that Hyatt can think of is that DB
>>>assigns a bonus to a rook on a file that looks like it might open up. That could
>>>really mean anything, and several commercial programmers say they already have
>>>terms that do this.
>>
>>Sorry, but you are way off the planet here.  I've mentioned more things that
>>they do that are computationally complicated.  King-side attacks is one that
>>they have mentioned more than once at their presentations.  Knowing which
>>pieces directly attack squares around the king, which pieces indirectly attack
>>squares around the king, which squares around the king are weak/critical, etc.
>>All doable in software.  I did some of this in Cray Blitz.  But I don't in
>>Crafty because while it is doable, the computational cost doesn't seem to be
>>offset by the skill gained, when the cost is so high.
>
>You may think the cost is too high, but I know for a fact that there are a ton
>of extremely strong programs out there that have these terms.

Name that "ton".  I've seen Rebel play.  It doesn't.  I have seen most every
micro play, and fall victim to attacks that say "I don't understand how all
those pieces are attacking my king-side..."






>
>>Potentially open files is computationally costly to compute.  I do some of
>
>Like I said, "potentially open files" can mean anything. How you define it
>determines the computational cost. So far, you haven't said anything specific
>about how the term was defined in DB, so I fail to understand why anybody in
>their right might would be impressed.


What is there to understand?  A potentially open file is a very concrete
thing, just like an open file or a half-open file is.  No confusing definitions.
No multiple meanings.



>
>>>In other words, I don't think it would be especially difficult to run DB's
>>>algorithms on a PC. (It would be extremely interesting, too.) There are a few
>>>possible reasons why this hasn't happened yet:
>>>
>>Nobody said it would be difficult.  However, in 1995, Cray Blitz could search
>
>Actually, you said it would be difficult to the point of impossible. Remember a
>couple of years ago when I was discussing implementing DB in software and you
>went on a tirade about how slow circuit simulation is, because apparently you
>believed that the only way to implement DB's algorithms was to simulate the
>chip? I'm glad you've changed your mind about this, too.


Not "difficult to do".  I believe I said "impossibly slow".  There _is_ a
difference.  Everything they do in parallel, you would get to do serially.
All the special-purpose things they do in a circuit, you get to use lots of
code to emulate.  I estimated a slow-down of 1M.  I don't think I would change
this.  Cray Blitz lost a factor of 7,000 from a Cray to a PC of the same
time period.  Solely because of vectors and memory bandwidth.  Crafty on a cray
gets population count, leading zeros, all for free.  Because there are special-
purpose instructions to do these quickly.  DB was full of those sorts of
special-purpose gates.




>
>>>a) Lazyness or non-interest. Maybe the people on the team have moved on to other
>>>things or just don't feel like doing the work.
>>
>>Rather than laziness, what about "futility"?  IE what would be the point?
>>They have access to the real hardware (DB Jr is still alive and well and
>>providing analysis from time to time at chess tournaments) so why would they
>>want access to something that is so slow as to be useless?
>
>You've spent years building up DB's evaluation function. Surely you can see some
>benefits (even aside from commercial) of having this thing run on widely
>available hardware.


at 1/1,000,000th the speed of the real mccoy?  Again, what would one learn from
such a thing?  What could I learn from working with a 1nps version of Crafty,
when it is going to run at 1M nodes per second when I get ready to play a real
game?

Not a lot...






>
>>>c) Reputation. Maybe the evaluation function really sucks as much as some people
>>>think, and releasing a program based on it would make DB the laughingstock of
>>>the computer chess community. Even if it's on par with that of the better
>>>commercial programs, some people will be disappointed that it isn't the
>>>revolutionary leap they expected. (There has been a TON of propoganda raising
>>>peoples' expectations.)
>>
>>Your logic escapes me, totally.  You think that a program that is designed
>>around special-purpose hardware could run on a PC, albiet maybe 1M times
>>slower.  And if that played poorly, it would damage their reputation?  Would
>>it damage the GNU compiler guys if I told the world their compiler can't play
>>chess worth a crap?  They might point out that it wasn't exactly designed to
>>play chess in the first place, it was designed to turn some high-level formal
>>language into machine language for most architectures.
>
>Here's another case where you just make up some idiot number. Where did 1M come
>from? I never even began to agree that "software DB" might run 1M times slower.
>As before, your argument sounds stupid if you just replace your made-up number
>with another, equally-likely made-up number like 1000. And even if software DB
>ran very slow (say, even as slow as MChess), a good eval function is a good eval
>function. If their function is anywhere near as good as you've made it out to
>be, it would be a strong program, and I assume you can think of uses for a
>strong program (as above).
>


We know how DB (single-chip) did when slowed to 1/10th its nominal speed
and played against top commercial programs.  That was reported by me first,
then others asked about it at lectures by the DB team and we got even more
information from those reports.

I am _certain_ that taking DB from hardware to software would cost a lot.
You would lose a factor of 480 because of the chess chips.  You would lose
a factor of 32 because of the SP.  You would lose a factor of something due
to the cost of doing Make/UnMake/Generate/Evaluate in software during the
software part of the search, rather than getting to use the hardware they
had to handle these mundane parts of the software search.  32 X 500 is over
10,000 already.  And it is only going to get worse.







>>>Personally, I think the real reason is a combination of these three.
>>>
>>>The next major issue about DB that's been overlooked is its search algorithm.
>>>People know that it searched 200M NPS and Deep Fritz only searches 5M NPS. Well,
>>>that sounds amazing, but it's pretty meaningless to compare NPS rates unless
>>>you're talking about the same program. MChess used to beat up on Fritz pretty
>>>bad while searching a small fraction of the nodes. The Rebel vs.
>>>Crafty-at-an-absurdly-long-time-control matches show the same thing: a program
>>>with a better evaluation function or search can search fewer nodes than the
>>>competition and still win.
>>
>>Earth to Tom.  Better re-check.  Ed won the first game in such a match.  He
>>later played more and didn't win another game.  This is in his chess 2010
>>experiment, where in that experiment both programs had a long time per move,
>>not just one.
>
>"Earth to Tom"? Yeah, I thought saying that was cool when I was in elementary
>school. Thanks for reminding me why I stopped posting here. I guess the match I
>was thinking of was only one game, but both programs were definitely not on
>equal time controls (that was the entire point of the thing). Here's a web page:
>http://www.rebel.nl/match.htm
>I don't need a specific number of games to prove my point, though, anyway.
>

When your data is flawed, you need more.  Crafty lost one game at a time
handicap.  Ed then played more games with crafty at the same time control,
but with rebel at that time limit also.  And the result was much different.
Which suggests that the first (and only) handicap game was a fluke, which
is certainly the most likely truth.




>>> Most of the "information" we have about DB's search
>>>is speculation. We "know" they didn't use null-move [all the time]. We "know"
>>>they didn't do forward pruning [all the time]. We "know" they used singular
>>>extensions [some of the time]. We actually know for a fact that their search was
>>>massively parallel, meaning their "effective" NPS rates were much lower than
>>>200M.
>>
>>You might "know" that, but nobody else does.  We "know" that their hardware
>>could search at a speed of over 1B nodes per second.  Half the processors
>>searched at 2M, the other half at 24M.  480 * 2.2M > 1B.  Hsu reported that
>>he could run the chess chips at about a 70% duty cycle with the SP hardware
>>platform.  That turns into 700M nodes per second (or a bit more).  Perhaps
>>his "200M" already factors in SMP losses.  That's the only way I can see DB
>>getting down to 200M based on the numbers he published about its potential
>>speed, number of processors, etc.  His PhD thesis claimed 20-30% efficiency
>>on the parallel search, which seems reasonable based on my 70% numbers with
>>full shared memory.  That would mean his 200M is already a real lower estimate.
>
>Occam's razor. Why would IBM make a big deal about 200M NPS when they could just
>as easily make a big deal about 1B NPS? Here's another possible explanation: 2M
>is the maximum per chip, and therefore using the fast eval for most positions.
>What if, in most circumstances, the fast eval was used, say, half the time?
>Wouldn't that divide your NPS rate by ~3, i.e., 233M NPS? That's pretty close to
>200M.


I won't try to speculate why they reported 200M.  Hsu was a scientist.  With
little to gain by hyperbole.  I suspect he just reported a number that is
"real".  Just as when someone asks "How much faster does crafty search when
using 4 processors?" I generally respond about 3x to 3.2x faster.  Sometimes
4x faster (or more).  Sometimes 2x faster (or less).  I'm not into marketing
hype, so I just post real numbers.  I don't really know how to take my NPS
and translate that into "effective NPS". without running the same search twice,
once with SMP and once without, and then dividing.  In Hsu's case, he may have
had a much better idea of his "losses" due to non-global hashing, no hashing in
the chips, etc...

I've never asked him since 200M is quite large enough for me compared to the
programs of today (or 1997 even more importantly).





>
>-Tom
Re: Wanted: Deep Blue vs. today's top programs recap Tom Kerrigan 16:40:42 08/27/01
- Re: Wanted: Deep Blue vs. today's top programs recap Robert Hyatt 19:52:41 08/27/01
  - Re: Wanted: Deep Blue vs. today's top programs recap Gian-Carlo Pascutto 01:34:42 08/28/01
    - Re: Wanted: Deep Blue vs. today's top programs recap Robert Hyatt 06:28:15 08/28/01
      - Re: Wanted: Deep Blue vs. today's top programs recap Gian-Carlo Pascutto 10:54:23 08/28/01
  - Re: Wanted: Deep Blue vs. today's top programs recap Tom Kerrigan 21:57:49 08/27/01
    - Re: Wanted: Deep Blue vs. today's top programs recap Robert Hyatt 06:50:37 08/28/01
      - Re: (more, had to interrupt finishing previous post due to netscape prob Robert Hyatt 06:57:30 08/28/01
        
        fine, last word Tom Kerrigan 10:22:46 08/28/01
        
        Re: fine, last word Roy Eassa 12:32:23 08/28/01
        
        Re: fine, last word Robert Hyatt 13:31:35 08/28/01
Re: Wanted: Deep Blue vs. today's top programs recap Uri Blass 13:19:00 08/27/01
- Re: Wanted: Deep Blue vs. today's top programs recap Robert Hyatt 19:26:09 08/27/01
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.