Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DIEP NUMA SMP at P4 3.06Ghz with Hyperthreading

Author: Matt Taylor

Date: 23:05:05 12/13/02

Go up one level in this thread


On December 13, 2002 at 22:56:17, Robert Hyatt wrote:

>On December 13, 2002 at 16:41:07, Vincent Diepeveen wrote:
>
>>On December 13, 2002 at 16:03:47, Robert Hyatt wrote:
>>
>>your math is wrong for many reasons.
>>
>>0) it isn't 32 bytes but 64 bytes that you get at once
>>   garantueed.
>
>Depends on the processor.  For PIII and earlier, it is _32_ bytes.  For
>the PIV it is 128 bytes.  I think AMD is 64...
>
>
>>1) you can garantuee that you need just cacheline
>
>Yes you can, by making sure you probe to a starting address that is
>divisible by the cache line size exactly.  Are you doing that?  Are
>you sure your table is initially aligned on a multiple of cache line
>size?  Didn't think so.  You can't control malloc() that well yet...
>And it isn't smart enough to know it should do that, particularly when
>the alignment is processor dependent.

You can detect that alignment. As for aligning with malloc, it is an easy trick.

malloc(x) => malloc(x + align - 1) & ~align

>>2) even if you need 2, then those aren't 400 clocks each
>>   cache line but the first one is 400 and the second
>>   one is only a very small part of that (consider the
>>   bandwidth the memory delivers!)
>
>Try again.  You burst load one cache line and that is all.  The first 8
>bytes comes across after a long latency.  The rest of the line bursts in
>quickly.  For the next cache miss, back you go to the long latency.

Actually, you probably won't incur much latency at all. The latency is based on
the assumption that RAS and CAS will have to be re-latched into the memory.
Locality of data is more efficient.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.