Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Who can update about new 64 bits chip?

Author: Vincent Diepeveen

Date: 09:21:49 08/25/02

Go up one level in this thread


On August 25, 2002 at 11:21:31, Dan Andersson wrote:


>>If you look at my response to Vincent, there are a few issues that need
>>addressing because NUMA introduces a much higher memory latency for
>>processors that are farther away from the actual memory word needed.  This
>>has to be addressed to produce reasonable speedups.
>>
>You say much higher. The latency for one CPU is 80 ns. For MP one hop is 115 ns,
>two hops 150 ns and three are less than 190 ns. That doesn't seem like much to
>me. Putting more information in a hash return could amortize the cost to almost
>nothing it seems.

I need to note that the difference between 150ns and 80ns is nearly
a factor 2, if that's not *considerable slower* then what is considerable
slower? Memory that gets 2 times slower, that's a *big* slowdown.

That's of course theoretic latency and probably only true
for sequential datastreams, which chessprograms are *not* busy
with. We need the latency for random cache line acces.

If you do a test yourself at a k7, you will find out that getting a cache line
is around 300-400 clocks for a cpu around 1.4Ghz speed.

So that's not *near* 80ns.

if hammer delivers a random cache line in 80ns i would be pretty amazed
as that's at 2Ghz like 160 clocks or so?

Obviously that's more than 2 times faster than nowadays k7s are doing it.

In short i don't believe the 80ns at all, because AMD says themselves
they expect latency gets 25% faster for their hammer cpu compared
to a k7. So taking the higher
clocked cpu into account we still will talk about ~400 clocks
for a random accesses in memory to it.

>>There are plenty of NUMA machines around today, notably from SGI, which can
>>be used to understand these issues...

>And we have those and an IBM system of certain chess fame at the university.

And diep is running at SGI to be clear, so i know *very* well what NUMA
means. the hardware they use is very sophisticated. A hop at their hardware
takes 50ns more, which is less than the quoted numbers above,
but that's also *sequential* speed.

The real latency of READING a random cache line (or variable in that
cache line obviously) from 128 bytes at a 4 node dual (8 processors)
is about 600 nanoseconds. This is *very good*. I am sure the average
man cannot afford the same quality SN0 routers which SGI is using.

Of course cache lines at hammer might be smaller than the SGI ones,
still it doesn't take away that getting a cache line is going to
eat 2000 cpu clocks easily, versus 400 now. That's factor 5 difference.




>>BTW memory latency has been constant for 20+ years now.  But as the speed of
>>the processors goes up, memory becomes relatively slower.  So long as we
>>use capacitors for bit storage, this is going to continue to remain a big
>>issue.
>Yep. But I was commenting on the specifics of the Hammer implementation. And the
>memory controller is on chip and some of the overhead will be reduced due to the
>fact that parts of it will run faster at higher speeds. It won't reduce the
>memory specific latency though. And the speedup will be bounded by the memory
>subsystem.
> As for the cost of the system. I wouldn't know. But this is a comodity level
>system. The HyperTransport bus could end up being mass produced on a scale never
>seen before. Excepting the PCI bus maybe.
> When I look at it it feels like I'm getting another chance of aquiring a
>Transputer based computer.
>
>MvH Dan Andersson



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.