Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Wanted: Deep Blue vs. today's top programs recap

Author: Robert Hyatt

Date: 19:52:41 08/27/01

Go up one level in this thread


On August 27, 2001 at 19:40:42, Tom Kerrigan wrote:

>
>Exactly. I've heard you say over and over that DB is vastly different from DT.
>And the code that's on Tim Mann's page is for DT. And it's a program for doing
>automatic tuning against GM games anyway, not the kind of tuning that was
>reportedly done for DB. Is it safe to assume that, because this is the best code
>you can produce, that you don't have _any_ actual DB-related code? And because
>you have to guess at the speed of DB code based on CB speeds, that you don't
>know _any_ specifics of the code they used? If that's the case, and it seems
>like it is, I don't see what business you have making the guesses you've been
>making and passing them off as informed estimates.


Nope.  The DB guys have reported that they used the same approach.  Obviously
they had to add code to match what DB1 and then DB2 could evaluate, but I
assumed that was pretty obvious.  I see no reason that anybody would take any
great pains to optimize something that is not time-critical in any way, when
they had so much other work to do for the match.  Some "informed guesses" plus
direct information from some of the team members gives pretty good info.  I'm
not the _only_ one that has talked to them, asked them questions, then reported
the results here...




>
>>There is no "knee-jerk".  Hsu says "XXX".  You say "I don't believe XXX".  >There
>>is little to justify that when _you_ don't _know_.
>
>I said "I don't believe this" to the idea that a software implementation of DB
>would be "so slow as to be worthless." When did Hsu say that a software
>implementation of DB would be so slow as to be worthless? In fact, when did Hsu
>say anything? I did some web searching and all I could find of his was some open
>letters about unrelated issues and an early paper on DB, with the estimate that
>a general purpose CPU would have to run at 10k MIPS to do what the DB chip does.
>Well, CPUs aren't THAT far away from 10k MIPS these days, so if you want to read
>anything into Hsu's words, it seems like he's siding with me.

Today's cpus _are_ "that far away" from sustained 10 BIPS.  Which is what he
said might be enough.  "might be".  Because in some respects he is just like
me...  He hasn't given a lot of thought to how he would do something in a
software program that he is currently doing in hardware.  My vectorized attack
detection code used 2% of the execution cycles in Cray Blitz.  On the PC this
balooned to 25%, even after we rewrote it to better suit the PC.  Had you asked
me how it would run prior to that, I would not have thought that a few dozen
clock cycles on a Cray would turn into a few thousand cycles on a PC.  Because
doing that never occurred to me until my chess partner wanted to know "Hey,
I found a good FORTRAN compiler for the 386...  can you compile the pure
FORTRAN version of CB so I can run it at home?"  He found it useless due to the
incredibly slow speed.



>
>(BTW, if you're interested, the same paper says that the DB chip took three
>years to create. This is a far cry from the 9 months that you stated in another
>post.)

You are reading the wrong stuff.  The _first_ DB chip took maybe 3 years,
and if you had read everything he wrote, and attended a lecture or two, you
would know why.  There were some interesting problems he had to overcome that
had nothing to do with chess.  Pads on the chip were too large.  Cross-coupling
between signal lines on the chips that was unexpected and required some cute
hardware work-arounds.  Complete batches of chips that were botched for various
reasons.

All you have to do is ask him...

DB 2 was _definitely_ done in 9 months from concept to production.  His book
will tell the story if/when it is published.






>
>>>You may think the cost is too high, but I know for a fact that there are a ton
>>>of extremely strong programs out there that have these terms.
>>
>>Name that "ton".  I've seen Rebel play.  It doesn't.  I have seen most every
>>micro play, and fall victim to attacks that say "I don't understand how all
>>those pieces are attacking my king-side..."
>
>I won't name the programs because I don't know if the authors would want me to.
>And I wasn't thinking of Rebel.
>
>>What is there to understand?  A potentially open file is a very concrete
>>thing, just like an open file or a half-open file is.  No confusing definitions.
>>No multiple meanings.
>
>Okay, so what is it? Is it one with a pawn lever? Or one without a pawn ram?
>Seems like both of those could be considered potentially open files, and they
>aren't exactly expensive to evaluate.


Says the man that hasn't evaluated them yet.  :)

You have to see if the pawn can advance to the point it can make contact
with an enemy pawn without getting lost.  It is definitely non-trivial.
From the man that _does_ evaluate them now.




>
>>Not "difficult to do".  I believe I said "impossibly slow".  There _is_ a
>>difference.  Everything they do in parallel, you would get to do serially.
>>All the special-purpose things they do in a circuit, you get to use lots of
>>code to emulate.  I estimated a slow-down of 1M.  I don't think I would change
>>this.  Cray Blitz lost a factor of 7,000 from a Cray to a PC of the same
>>time period.  Solely because of vectors and memory bandwidth.  Crafty on a cray
>>gets population count, leading zeros, all for free.  Because there are special-
>>purpose instructions to do these quickly.  DB was full of those sorts of
>>special-purpose gates.
>
>No, you're completely confusing the entire issue. Was DB written in Fortran, or
>Cray assembly? Did it run on a Cray? Does it have anything to do with a Cray?
>Does it even implement the same evaluation function? How about the same search?
>There are enough variables in your "estimation" here to make any legitimate
>scientist puke.


Only those that haven't done this.  DB was written in C.  Plus microcode for
the chess processors (first version).  Plus evaluation tables.  The issues are
the same.  Porting a program from one environment (hardware or vector in my
case) to another (software or non-vector in my case) presents huge performance
problems.  And if the end-result is not important, the "port" will be sloppy
because the goal is to get it done, quickly, period.  Not to make it efficient.




>
>>>You've spent years building up DB's evaluation function. Surely you can see some
>>>benefits (even aside from commercial) of having this thing run on widely
>>>available hardware.
>>
>>at 1/1,000,000th the speed of the real mccoy?  Again, what would one learn from
>>such a thing?  What could I learn from working with a 1nps version of Crafty,
>>when it is going to run at 1M nodes per second when I get ready to play a real
>>game?
>
>Again, assuming your 1M figure is anywhere near accurate. You're claiming that a
>DB node is worth about five thousand (5,000) (!!) "regular" PC program nodes.
>What on EARTH can POSSIBLY take 5,000 nodes worth of computation to figure out?
>You're going to have to do way better than your lame "potentially open file"
>thing to sell that to anyone.


I'm not saying any such thing.  I simply said that they do a _bunch_ of things
in their eval, in parallel.  Not to mention the mundane parts like maintaining
the chess board.  I consider their raw NPS to be 200X more than a traditional
micro of today.  I consider their effective NPS to be 5x more than that, based
on the eval things they can do for nothing that we don't do because of the
costs.

That's all I have said, although I _have_ said it often.  You are trying to
mix up the emulation of their evaluation, which I say would be hugely slow
on today's PCs.  So, to be clear, the Hardware they had was quite good.  And
sort of software emulation would be highly ugly.  Because things done in
hardware often don't translate "nicely" into software.  The special-purpose
bit counting/finding instructions on the cray are well-known examples that
take a clock cycle on the cray, but take dozens of clock cycles on a PC.

I don't know how to explain it any better.  Until you have done it, you
might simply be unable to understand it.  I'm not going to keep going over
it however.




>
>>We know how DB (single-chip) did when slowed to 1/10th its nominal speed
>>and played against top commercial programs.  That was reported by me first,
>>then others asked about it at lectures by the DB team and we got even more
>>information from those reports.
>
>No, we don't "know" that. Where are the reports? Where are the game scores?

Someone here can give more information.  I reported on the first 10 game
match.  We later found out there were 40 games.  Someone _else_ found this
out at a lecture by Campbell.  Since he said it, I feel confident that it
happened.  There are _some_ people that can be trusted to be honest.





>
>>I am _certain_ that taking DB from hardware to software would cost a lot.
>>You would lose a factor of 480 because of the chess chips.  You would lose
>>a factor of 32 because of the SP.  You would lose a factor of something due
>>to the cost of doing Make/UnMake/Generate/Evaluate in software during the
>>software part of the search, rather than getting to use the hardware they
>>had to handle these mundane parts of the software search.  32 X 500 is over
>>10,000 already.  And it is only going to get worse.
>
>10k is a _really_ far cry from 1M.

I simply stopped at 10K.  my bit count instruction on the cray is a couple
of clock cycles.  On the PC it is about a hundred.  That is another factor
of 50.  Now we are at 500K.  I have no idea what problems they would encounter
that would match the problems I found in trying to do some things on a PC that
were trivial on the Cray.  IE my mobility was murder on the PC.  On the Cray,
it was basically "free" (qualitative mobility as I explained it to Vincent
a year ago).






> Besides, if you think that DB's algorithms
>are completely worthless if they aren't running on their fast hardware, why
>doesn't that apply to any other PC program? Are they all worthless because they
>don't search 200M NPS? Or because they can be run on slower PCs? Or because they
>will be run on faster PCs in the future? What you're saying is basically, "why
>have a chess program?" I'm surprised you haven't thought of any reasons by now.
>


I have absolutely no idea what you are rambling about.  A chess engine
designed to search on hardware that can do 1K nodes per second is a _far_
different chess engine than one designed to run on hardware that can search
1M nodes per second.  Yes the 1K program will be better at 1M.  But the 1M
program will be far worse at 1K than a program designed for 1K.

Which simply means that part of the design process factors in the speed of the
search.  Or at least good programs do.



>>When your data is flawed, you need more.  Crafty lost one game at a time
>>handicap.  Ed then played more games with crafty at the same time control,
>>but with rebel at that time limit also.  And the result was much different.
>>Which suggests that the first (and only) handicap game was a fluke, which
>>is certainly the most likely truth.
>
>Changing the experiment does not magically invalidate data. If you want to call
>all of your losses "flukes," fine.

One game is completely statistically invalid to predict _anything_.  Which is
why we originally settled on 10 games.

If you can draw conclusions from one game, feel free.  I can't.  I would
prefer 100 or 1000 to get some statistical significance.



>
>>I won't try to speculate why they reported 200M.  Hsu was a scientist.  With
>
>Why is there any need to speculate? I think I posted a perfectly legitimate
>potential explanation for the number. There are probably more possible
>explanations. Why in the world do you refuse to take his number at face value?
>
>-Tom

I do.  I have read everything he has written.  He gave the speeds of the two
batches of chess processors.  He gave the total number.  He gave the 70% duty
cycle number.  That comes to 700M.  In yet another place (his Ph.D. thesis I
believe) he claimed 20-30% search efficiency for his two-level parallel search.

All of those numbers, taken together, could be used to derive the 200M number
in several different ways.  I suspect my conclusion is closest to the truth.
He has reported depths of 12 plies.  When pressed, he then responded with yes,
we were doing 12 plies in software plus 5-7 in hardware.  If no one thinks to
ask about his numbers, and just take them at face value, the conclusions can
be wrong...






This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.