Computer Chess Club Archives




Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Robert Hyatt

Date: 06:30:36 07/15/03

Go up one level in this thread

On July 15, 2003 at 06:20:28, Vincent Diepeveen wrote:

>On July 14, 2003 at 15:33:37, Gerd Isenberg wrote:
>>On July 14, 2003 at 10:54:49, Vincent Diepeveen wrote:
>>>On July 13, 2003 at 17:10:10, Russell Reagan wrote:
>>>>On July 13, 2003 at 13:17:56, Bas Hamstra wrote:
>>>>>It is used *extremely* intensive. Therefore I assumed that most of the time the
>>>>>table sits in cache. But apparently no... Makes you wonder about other simple
>>>>>lookup's. A lot of 10 cycle penalties, it seems.
>>>>Hi Bas,
>>>>Why you say "10 cycles"? I thought memory latency was many more cycles (~75 -
>>>Random read from memory at dual P4 or dual K7 is like nearly 400 nanoseconds.
>>>So that's at 2Ghz around 800 cycles.
>>>Best regards,
>>Hi Vincent,
>>puhh... that's about 1/2 microsecond. I remember the days with
>>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions...
>>I'm not familar with dual-architectures. Is it a kind of shared memory via
>>pci-bus? How do you access such ram - are the some alloc like api-functions?
>>What happens, if one perocessor writes this memory through cache - what about
>>possible cache copies of this address in the other processor, or in general how
>>do the severel processor caches syncronise?
>>I guess each processor has it's own local main-memory.
>>Do you know the read latencies of single processor P4 or K7 with state of the
>>art chipsets?
>Yes single cpu P4s were tested at around 280 ns here.
>For random reads. Read me well. I will ship you source code to test it.
>you can see for yourself then.
>It really is that bad.
>this is why future itanium processors intel had planned to create huge L3 caches
>like up to 24MB or something in a few years.
>>1.) if data is already in 1. level cache
>I need to quote from head now: P4 needs 2 cycles and K7 3
>>2.) if data is in 2. level cache but not in 1.
>Depends upon processor. For example the diagram shown by Priestly at the
>conference (marketing manager from intel) shows next for Itanium2-madison with
>6MB L3 cache (which is the fastest processor they got, the 1.3 budget processors
>are worse than this):
>L3 cache 3/6MB 128B CL 24-way 14-17 clks (arrows from/to L2 cache)
>L2 cache 256KB 128B CL 8 way  5-7 clks (arrows from to L1 cache)
>L1I 16KB 64B CL 1 clk
>L1D 16KB 64B CL 1 clk
>I have his presentation sheets in PDF format here. If you want to i can ship it
>to you.
>>3.) in worst case, if data is only in main memory but in no cache
>That depends upon whether the line to memory is already opened. So sequential
>reads are *considerable* faster latency than random read latency.
>For P4 single cpu a random lookup in the memory is around 300 nanoseconds.
>For dual it's nearly 400 nanoseconds as i measured.
>If you want to you can have this source code to measure the average lookup
>speed. It gets influenced so much by random lookups into memory that random
>lookups in 3 is dominating outcome of the results there.
>I am talking about measured values. Not guessed values like Hyatt is talking

You are talking about utter crap, not measured values.  lm-bench is the
industry-accepted measure for random access memory latency, memory bandwidth,
cache bandwidth, cache latency, and everything else.

Ramble all you want.  _anybody_ can look up lm-bench on the web and see
that what you are writing is utter nonsense.

All they have to do is _look_.

I wouldn't trust a program you wrote to add 2+2 correctly, much less try to
measure any sort of latency.  To do that requires hardware understanding
_first_.  Which is sadly lacking on your part.

But rather than continue this, all anybody has to do is to download the
program _everybody_ uses to measure memory performance, and then run it.

>>Thanks in advance,

This page took 0.01 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.