Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Gerd Isenberg

Date: 11:27:55 04/16/04

Go up one level in this thread


On April 16, 2004 at 12:40:21, Robert Hyatt wrote:
<snip>
>>>>So a possible alternative to evaltable would be hashing in qsearch.
>>>>
>>>>Evaltable would be my guess though. Doing a random lookup to a big hashtable at
>>>>a 400Mhz dual Xeon costs when it is not in the cache around 400ns.
>>>>
>>>
>>>
>>>
>>>Depends.  Use big memory pages and it costs 150 ns.  No TLB thrashing then.
>>
>>We all know that when things are in L2 cache it's faster. However when using
>>hashtables by definition you are busy with TLB trashing.
>
>Wrong.
>
>Do you know what the TLB does?  Do you know how big it is?  Do you know what
>going from 4KB to 2MB/4MB page sizes does to that?
>
>Didn't think so...
>
>>
>>Use big hashtables start doing lookups to your hashtable and it costs on average
>>400 ns on a dual k7 and your dual Xeon. Saying that under conditions A and B,
>>which hardly happen, that it is 150ns makes no sense. When it's in L2 cache at
>>opteron it just costs 13 cycles. Same type of comparision.
>
>Using 4mb pages, my hash probes do _not_ take 400ns.  You can say it all you
>want, but it will never be true.  The proof is intuitive for anyone
>understanding the relationship between number of virtual pages and TLB size.
>

Hi Bob,

I guess you are talking about P4/Xeon.

What i read so far about opteron, there is a two level Data-TLB as "part" of the
L1-Data Cache (1024 - 64 Byte cache lines), which maps the most-recently-used
virtual addresses to their physical addresses. The primary TLB has 40 entries,
32 for 4KB pages, only eight for 2MB pages. The secondary contains 512 entries
for 4KB pages only. De Vries explains the "expensive" table-walk very
instructive.

Do you believe, with huge random access tables, let say >= 512MB,
that eight 2MB pages helps to avoid TLB trashing?

Are there special OS-dependent mallocs to get those huge pages?
What about using one 2M page for some combined CONST or DATA-segments?
It would be nice to guide the linker that way.

Thanks,
Gerd


some references:

Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™
Processors
Appendix A Microarchitecture for AMD Athlon™ 64 and AMD Opteron™ Processors
A.9 Translation-Lookaside Buffer


Understanding the detailed Architecture of AMD's 64 bit Core
                 by Hans de Vries

http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

3.3 The Data Cache Hit / Miss Detection: The cache tags and the primairy TLB's
3.4 The 512 entry second level TLB
3.16 The TLB Flush Filter CAM




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.