Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Vincent Diepeveen

Date: 17:05:47 04/16/04

Go up one level in this thread


On April 16, 2004 at 14:32:32, Anthony Cozzie wrote:

>On April 16, 2004 at 14:27:55, Gerd Isenberg wrote:
>
>>On April 16, 2004 at 12:40:21, Robert Hyatt wrote:
>><snip>
>>>>>>So a possible alternative to evaltable would be hashing in qsearch.
>>>>>>
>>>>>>Evaltable would be my guess though. Doing a random lookup to a big hashtable at
>>>>>>a 400Mhz dual Xeon costs when it is not in the cache around 400ns.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Depends.  Use big memory pages and it costs 150 ns.  No TLB thrashing then.
>>>>
>>>>We all know that when things are in L2 cache it's faster. However when using
>>>>hashtables by definition you are busy with TLB trashing.
>>>
>>>Wrong.
>>>
>>>Do you know what the TLB does?  Do you know how big it is?  Do you know what
>>>going from 4KB to 2MB/4MB page sizes does to that?
>>>
>>>Didn't think so...
>>>
>>>>
>>>>Use big hashtables start doing lookups to your hashtable and it costs on average
>>>>400 ns on a dual k7 and your dual Xeon. Saying that under conditions A and B,
>>>>which hardly happen, that it is 150ns makes no sense. When it's in L2 cache at
>>>>opteron it just costs 13 cycles. Same type of comparision.
>>>
>>>Using 4mb pages, my hash probes do _not_ take 400ns.  You can say it all you
>>>want, but it will never be true.  The proof is intuitive for anyone
>>>understanding the relationship between number of virtual pages and TLB size.
>>>
>>
>>Hi Bob,
>>
>>I guess you are talking about P4/Xeon.
>>
>>What i read so far about opteron, there is a two level Data-TLB as "part" of the
>>L1-Data Cache (1024 - 64 Byte cache lines), which maps the most-recently-used
>>virtual addresses to their physical addresses. The primary TLB has 40 entries,
>>32 for 4KB pages, only eight for 2MB pages. The secondary contains 512 entries
>>for 4KB pages only. De Vries explains the "expensive" table-walk very
>>instructive.
>>
>>Do you believe, with huge random access tables, let say >= 512MB,
>>that eight 2MB pages helps to avoid TLB trashing?
>>
>>Are there special OS-dependent mallocs to get those huge pages?
>>What about using one 2M page for some combined CONST or DATA-segments?
>>It would be nice to guide the linker that way.
>>
>>Thanks,
>>Gerd
>>
>>
>>some references:
>>
>>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™
>>Processors
>>Appendix A Microarchitecture for AMD Athlon™ 64 and AMD Opteron™ Processors
>>A.9 Translation-Lookaside Buffer
>>
>>
>>Understanding the detailed Architecture of AMD's 64 bit Core
>>                 by Hans de Vries
>>
>>http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html
>>
>>3.3 The Data Cache Hit / Miss Detection: The cache tags and the primairy TLB's
>>3.4 The 512 entry second level TLB
>>3.16 The TLB Flush Filter CAM
>
>
>Opteron blows in this regard, but I believe that all 64(?) of the P4's TLB
>entries can be toggled to large pages.
>
>Also, it is worth nothing that the opterons current TLB supports only 4MB of
>memory using the 4KB pages, so 8 MB is still an improvement :)  Hopefully when
>AMD transisitions to the 90nm process the new core will fix this.
>
>anthony

Opteron doesn't blow at all. Just test the speed you can get data to you at
opteron.

It's 2.5 times faster than Xeon in that respect.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.