Computer Chess Club Archives




Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Vincent Diepeveen

Date: 03:20:28 07/15/03

Go up one level in this thread

On July 14, 2003 at 15:33:37, Gerd Isenberg wrote:

>On July 14, 2003 at 10:54:49, Vincent Diepeveen wrote:
>>On July 13, 2003 at 17:10:10, Russell Reagan wrote:
>>>On July 13, 2003 at 13:17:56, Bas Hamstra wrote:
>>>>It is used *extremely* intensive. Therefore I assumed that most of the time the
>>>>table sits in cache. But apparently no... Makes you wonder about other simple
>>>>lookup's. A lot of 10 cycle penalties, it seems.
>>>Hi Bas,
>>>Why you say "10 cycles"? I thought memory latency was many more cycles (~75 -
>>Random read from memory at dual P4 or dual K7 is like nearly 400 nanoseconds.
>>So that's at 2Ghz around 800 cycles.
>>Best regards,
>Hi Vincent,
>puhh... that's about 1/2 microsecond. I remember the days with
>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions...
>I'm not familar with dual-architectures. Is it a kind of shared memory via
>pci-bus? How do you access such ram - are the some alloc like api-functions?
>What happens, if one perocessor writes this memory through cache - what about
>possible cache copies of this address in the other processor, or in general how
>do the severel processor caches syncronise?
>I guess each processor has it's own local main-memory.
>Do you know the read latencies of single processor P4 or K7 with state of the
>art chipsets?

Yes single cpu P4s were tested at around 280 ns here.

For random reads. Read me well. I will ship you source code to test it.
you can see for yourself then.

It really is that bad.

this is why future itanium processors intel had planned to create huge L3 caches
like up to 24MB or something in a few years.

>1.) if data is already in 1. level cache

I need to quote from head now: P4 needs 2 cycles and K7 3

>2.) if data is in 2. level cache but not in 1.

Depends upon processor. For example the diagram shown by Priestly at the
conference (marketing manager from intel) shows next for Itanium2-madison with
6MB L3 cache (which is the fastest processor they got, the 1.3 budget processors
are worse than this):

L3 cache 3/6MB 128B CL 24-way 14-17 clks (arrows from/to L2 cache)
L2 cache 256KB 128B CL 8 way  5-7 clks (arrows from to L1 cache)
L1I 16KB 64B CL 1 clk
L1D 16KB 64B CL 1 clk

I have his presentation sheets in PDF format here. If you want to i can ship it
to you.

>3.) in worst case, if data is only in main memory but in no cache

That depends upon whether the line to memory is already opened. So sequential
reads are *considerable* faster latency than random read latency.

For P4 single cpu a random lookup in the memory is around 300 nanoseconds.
For dual it's nearly 400 nanoseconds as i measured.

If you want to you can have this source code to measure the average lookup
speed. It gets influenced so much by random lookups into memory that random
lookups in 3 is dominating outcome of the results there.

I am talking about measured values. Not guessed values like Hyatt is talking

>Thanks in advance,

This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.