Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: ASCI White vs. Deep Blue

Author: Vincent Diepeveen

Date: 20:31:42 09/24/01

Go up one level in this thread


On September 24, 2001 at 22:17:41, Robert Hyatt wrote:

>On September 24, 2001 at 18:38:38, Vincent Diepeveen wrote:
>
>>On September 23, 2001 at 22:36:38, Robert Hyatt wrote:
>>
>>>On September 23, 2001 at 18:20:30, Vincent Diepeveen wrote:
>>>
>>>>On September 23, 2001 at 15:30:08, Lonnie Cook wrote:
>>>>
>>>>>* It weighs 106 tons
>>>>>
>>>>>* costs 110M for the unit itself (doesn't include the ungodly sum to run it
>>>>>every day)
>>>>>
>>>>>* Has 8,192 IBM Power3 processors
>>>>
>>>>>* 12.3 trillion ops per sec.
>>>>>
>>>>>* took 28 tractor-trailer trucks to deliver
>>>>>
>>>>>this was the part that astounded me. It said it was 1,000 X's faster than Deep
>>>>>Blue!!
>>>>>
>>>>>so we're talking about a machine that in theory could do 200,000,000,000 nps!!
>>>>
>>>>Noop.
>>>>
>>>>IBM power3 processors. i do not know what speed they run at. Let's guess
>>>>they run at 375Mhz. Hehe , a cheated guess kind of.
>>>>
>>>>Now i have some numbers on these processors, but those are a few years old
>>>>of course. These processors suck bigtime of course. NO one wants to run
>>>>on 375Mhz processors nowadays. But well let's assume that at a stupid
>>>>cluster which ASCI white is, that you can get a decent speedup.
>>>>
>>>>Now how fast do i run at 1 node? Well that's like 15k nodes a second.
>>>
>>>That math is bad.  I'll "race" you using any PIV of your choice, me using
>>>an 800mhz 21264 of my choice.  And my lowly 800mhz processor will toast your
>>>doors off.
>>
>>I'll take the bet but not with a P4, but a K7 from 1.4Ghz.
>>I can already proof with kind of induction that for DIEP the
>>K7 is faster.
>>
>>The so much praised 21164 at 633Mhz is for DIEP the same like
>>a PII clocked at 380Mhz.
>
>So what.  Your program is 32 bits.  Why should it run a lot faster on a
>64 bit machine.  Mine does.  But mine is 64-bit based...
>
>
>>
>>A PIII is 17.3% faster than that.
>>An athlon is another 7% faster than the PIII
>>An MP athlon is a few % faster than an athlon.
>>
>>I'll equip the K7 MP 1.4ghz with DDR ram, so that the memory also
>>isn't the weak chain.
>>
>>Now the K7 can do at most 3 instructions a clock and comes pretty
>>close to that.
>
>wanna bet?
>
>>
>>The 21264 has a way longer stage but can do 4 instructions a clock.
>
>
>It is not a "way longer stage".  The 264 is a pretty good box.  And is
>getting good results even though its raw clock speed is 1/2 the intel
>limits at the moment.
>
>
>>
>>So on paper an 21264 can be *at most* 33% faster than a processor
>>doing 3 instructions a clock.
>
>
>Nope. The 264 can move 256 bits of data every clock cycle.  The intel machines
>can move 64.  The bus is _far_ faster.
>
>
>>
>>However the 21264 has some bad habits
>>  a) generating assembly for it is hell difficult, i don't doubt
>>     that DEC did a good job here.
>>  b) the penalty for a misprediction is *huge*
>>  c) the 21264 does *not* have more BTB tables than the K7 MP.
>>
>>Now you'll claim crafty is factor 2 faster at it, this means only that you
>>designed your program wrong for the 32 bits processors.
>>
>>Now Alpha 21264 was a pretty good design compared to other processors,
>>according to major experts. too bad that we won't see any alpha processors
>>anymore, perhaps good as well, because the thing sucked bigtime for me
>>always.
>
>
>I don't know where you get this.  Alphas are still shipping and there are
>no public plans to stop 'em at the moment...
>
>>
>>A K7 MP from 1.4Ghz is of course going to blow away an IBM processor
>>from 375Mhz.
>>
>>To the left, to the right, top and bottom.
>>
>>Add to that huge parallel losses and that cluster communication is
>>not making up soon for more Mhz, then you know how bigtime you're dicked
>>with this machine.
>>
>>dual 1.4ghz x 2 = 2.8Ghz.
>>to get to 2.8 Ghz with 375Mhz you need 7.5 processors.
>>
>>However considering parallel loss, loss over the network, you need
>>more like 64 processors or so to make up for that.
>
>
>
>Do you know _anything_ about the SP2 architecture?
>
>No?  Thought so...
>
>
>>
>>>Don't just assume that 375mhz is bad.  The PPC is _not_ a bad machine. I
>>>have run on SP's...
>>
>>You designed a 64 bits program!
>>
>>I do not know which application they planned to run on this thing, but
>>obviously a good programmer can do the same at a dual 1.4Ghz MP K7 easily.
>>
>>Most likely they use some kind of badly programmed thing which works correct
>>and that then the assumption is that such particle calculations only need
>>to get approached, simply taking the error which happens because you can't
>>profit from shared memory!
>>
>>So probably the whole model used at it sucks bigtime.
>>
>>In molecular physics some 'leading' scientists used a lineair approximation
>>for matrix calculations. Weird behaviour of the model was then explained
>>by some weird lemma's. Recently a bright doctor Sieds Zijlstra however
>>showed that by using a better program without bad approximations but by
>>using exact matrix calculations using a way faster programming language
>>library, it was possible to get rid of all the weird behaviour and simply
>>refute all the weird lemma's!
>>
>>These reports you of course keep hearing. In short, those machines are
>>probably going to idle, and do unuseful things like factorizing (which
>>can be done *at least* 10000x faster in selfmade hardware with build in
>>prime base).
>>
>>Everyone can imagine how the machine started to exist. "We need a super
>>machine that kicks the hell out of everyone, it must be bigger!"
>>
>>Salesman: "Ah you want more processors than anyone else!!"
>>
>>"Sure"
>>
>>Salesman: "8192 sounds ok to you?"
>>
>>"Excellent!"
>>
>>There were of course other problems: this machine needed to be produced
>>by IBM. Some 10000 processor thing from intel already existed.
>>It is called ASCI RED. It had 450Mhz Xeon processors. Now i don't
>>doubt that 375Mhz processors could possibly overwhelm a 450Mhz 32 bits
>>processor (which btw is 20% slower than a K7 MP would be at 450Mhz) at
>>certain applications like 64 bits math.
>>
>>I've been working on Sun processors for years, and forgive me i do not
>>remember the types, but when the department bought new machines, they were
>>300Mhz. At that time i had a 450Mhz pii at home, and it was very quickly
>>clear to me that the latest type SUN processor was a joke compared to
>>the PIIs.
>
>So what?  Sun has _always_ made dog-slow machines.  But that doesn't mean
>everybody else does.  Just hang on to your "can't do that" vision for a while
>longer, and I'll demonstrate to you what a cluster _can_ do, while you try to

Have fun running on a 375Mhz processor.

You're arguing like those Macintosh guys who praise their G4.

Well my sister has a g4 at 450Mhz. and it's dual.

Happy for my sister, she's btw a graphics designer and the only
reason she has macintosh is because she can't work with windows (yet).

I ship to apple/motorola a few questions regarding things of the cpu.

Simple questions. Still today i have no answer back from them!

Same will happen with the processors you described.

I mean a 375Mhz processor. How for christ sake is it going to beat
*ever* a 1.4Ghz MP processor or a 2.0Ghz P4 processor?

Bandwidth, Que? I thought we talked about how fast a chessprogram
runs on it. Not on whether meteorologists are happy as i can answer
that already. Meteorologists aren't happy right now as it's raining
and thundering here!

>argue "but it can't do that."  Just like you argued for months about the
>speedup numbers in Cray Blitz even though I showed you the raw output.  "It

what i say now is that the speedup you get at cray blitz at 8 processors
is 6.6, and that on a cluster with 8 nodes you'll never get close to that,
not even close to 4.0 that's what i think, because at the cray blitz
you had shared memory, here you do not have shared memory!

So our old bet still stands!

Actually i think it's going to be pretty hard to get a better than
squareroot speedup first at 8 nodes. sqrt 8 = 2.83

Note i wasn't complaining about cray blitz but about APHID being a
not realistic thing because its numbers are based upon 8 ply searches.

>just can't do that" is a common theme.  But before you launch into that kind of
>argument, you ought to at _least_ know what you are talking about.  If you
>haven't run on an SP, nor studied the tech references on it, it makes you look
>foolish to dismiss it as useless when some _serious_ computer scientists are
>using these machines daily to solve serious computational problems.  Just
>because _you_ can't use 'em doesn't mean everyone is so limited.

Yes i laugh for 375Mhz processors now that it's september 2001!

To me making a machine existing out of 375Mhz processors
it's like next:

You design worlds biggest aircraft (superjumbo)
and instead of using gigantic engines like jumbo's use,
you propel the worlds biggest aircrafty using 8192 old
bicycles like you see in our beloved capital Amsterdam so much.

>
>
>>
>>No they were not slower than a 266Mhz PII would have been for me. And
>>my code had some things which now would do better at a 64 bits machine
>>but at that time a bit worse so i considered it equally fast to a PII
>>at 300Mhz.
>>
>>But the PII processsor was already years old at that time, whereas the
>>brandnew SUN processor was only clocked 300Mhz!!!!!!!!
>
>trash.  so what?
>
>
>>
>>Each workstation (single cpu) was 5 times the price of a PII450 system.
>>
>>Of course that PII450 couldn't be put in a 32 processor shared memory
>>system, which the SUN most likely can be put in.
>>
>>The PII450 isn't hot swappable etcetera.
>>
>>So if you really want to run an application which has been written for
>>a cluster, and then can put it at 8192 processors (which will never
>>be able to get used at the same time i bet. most likely you can at
>>most allocate 1000 processors or so for a single job).
>
>
>How about adopting a new standard for yourself?  Before you say something,
>check it out.  "I bet" is not going to win friends and influence people in
>the world of computing.  "I have shown" is far more convincing.
>
>
>
>
>>
>>In that case there is of course a use in having such a cluster.
>>
>>But the speedup over a dual 1.4 MP will be most likely not
>>even close to a factor 1000.
>>
>>Factor 100 perhaps?
>
>I'll bet that 8K processors can produce a 1000x faster search.  But even
>if it was only 500 times faster, that will still cook your goose for Sunday.
>
>
>
>
>>
>>Pay a programmer a bit and it's a factor 30 perhaps?
>>
>>Now if this process runs for a week, then for research institutes there
>>is of course a big advantage, because you are 30 weeks faster!
>>
>>You need 1 week instead of 30.
>>
>>Obviously there is a use here to make a huge system, but i would be
>>pretty amazed if it's getting used like that.
>>
>>Most likely 100 scientists kick on that they get 64 processors
>>from a 8192 processor machine!
>
>Your "most likely" is garbage.  Why don't you ask someone at one of the
>SP2 computer sites?  I know some I will be happy to put you in touch with.
>Maybe some of the guys up at Oak Ridge will give you _real_ data to erase
>your bad guesses...
>
>Want some names???
>
>
>
>
>>
>>The only real advantage on this machine is again for the meteorologists,
>>who can use big memory, bit storage, and big bandwidths.
>>
>>But well. They don't need many processors. Just a huge RAM memory!
>>
>>The bottom line is that compared to a 1.4Ghz MP, they already need
>>16 times more processors for each MP you would use!
>>
>>If a scientist allocates 32 processors with an application that's only
>>needing processor power, then a dual 1.4 will be faster for them!
>>
>>If they need its bandwidth, why then create a machine with so many
>>processors?
>
>
>You are getting to the issue.  Maybe because they _need_ that much computational
>power.
>
>
>
>
>>
>>>>Still probably optimistic number of nodes a second.
>>>>So at 8192 processors, from which you can perhaps use a 1000 at a time,
>>>>I would get 15M nodes a second.
>>
>>>>Now that looks great, but that's of course on a CLUSTER. Speedup perhaps
>>>>10%. 1.5M nodes a second effectively, but the bigger the depth the less
>>>>the speedup gets as the branching factor will be worse, unless i accept
>>>>that the thing first slows down at each processor (which is a likely
>>>>approach) and pray that the latency is more than fast at this thing.
>>>>
>>>>So you sure outsearch deep blue by many plies, but not if a new deep
>>>>blue would be pressed on a chip using nullmove and DDR-RAM at it.
>>>>
>>>>So you are not faster in NPS, but search improvements would let it
>>>>search deeper. that still wouldn't make my DIEP faster on this machine
>>>>than DB was in nodes a second.
>>>>
>>>>Of course DBs focus upon only getting the maximum number of NPS (that's
>>>>how they advertised the thing. search depths have no commercial value)
>>>>sure made it faster than what i would get on this machine.
>>>>
>>>>>Is this really so for those in the know with hardware and these types of
>>>>>machines?



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.