Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CB sponsors entry of _four_ programs to WMCCC 2000!?

Author: Robert Hyatt

Date: 20:37:00 08/12/00

Go up one level in this thread


On August 12, 2000 at 08:54:10, David Blackman wrote:

>On August 09, 2000 at 09:48:21, Robert Hyatt wrote:
>
>>Another good comparison:  Cray-1 took 7-8 clocks to read from memory.  The XMP
>>took this to 14 clock cycles or so, all the way to the C90 which took this to
>>25 clock cycles.  If you take cpu cycle time, and then memory access time, you
>>get a flat line for memory speed.  As the cpu got faster, it simply took more
>>cycles to read from memory.  ie 7*12.5 (cray 1 clock) is about 14*8, the XMP
>>time, which is about 25*4, the C90 time...
>
>The Crays aren't a very good example for memory latency. The Cray-1 and early
>XMPs were built with incredibly fast RAM for the time, the stuff other companies
>used for 1st level cache. And even with the price people payed for a Cray-1 they
>still only got about 32MB of the stuff.

This was only true for very early crays. By the time the cray-1S rolled off
the assembly line, it was using DRAM rather than bipolar memory as in the
original cray-1.  Bipolar simply wasn't dense enough to provide the huge
memory sizes that even 1980 crays had to have (16 megawords, 8 byte words,
was considered so-so at that point).


>
>Later it was realised that supercomputer customers really wanted enourmous
>ammounts of RAM, that they could live with more clock cycles of latency as long
>as  bandwidth was good, and that if you tried to cram gigabytes of RAM into a
>computer latency would get worse, due to both cost and engineering constraints.
>So by the time you get to the C-90, you do get gigabytes of RAM, at similar
>price to a Cray-1, but despite 15 years newer technology, the latency is only a
>little better. Of course bandwidth is out of this world, and long vector stuff
>is amazing.
>
>In the PC and workstation world, there was some of the same effect up to about
>1995, but then they woke up that latency was still hurting them even with caches
>and high bandwidth cache line loads. When PCs need performance, it's for
>different workloads than supercomputers:-). Since 1995 latency has definately
>improved on PCs. (I haven't used a workstation since then, but i bet they have
>improved too.)
>


I haven't seen _any_ latency improvements since 1995.  Don't get fooled by
SDRAM and the like streaming memory into the L1 cache.  Write a program
that addresses words that are scattered randomly over memory.  Then that
fancy streaming/buffering is worthless and you get a feel for true memory
latency.  And you'll see that 100us number is pretty close to the truth.




>TO get back to computer chess at least a little, i'd guess that latency is a bit
>more important than bandwidth. The stuff you want from main memory is usually no
>bigger than a cache line, and you want it on such short notice it's hard to hide
>it in the pipeline.


This is true for 'big' programs.  And obviously true for hash probes.  The
way to optimize for the PC has everything to do with temporal locality of
memory references, to optimize cache fetches.


>
>For my program, main memory speed is not a big factor yet. That's because it's
>small enough to fit in cache except for hash tables, and it's slow enough that
>it can't drive the hash tables any faster than memory can handle. But for larger
>and/or faster programs, main memory must be starting to bite.


You will see it more and more as you add things...  particularly when the
code size blows out L1 completely.  L2 is faster than memory, but the latency
is gross even so.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.