Author: Vincent Diepeveen
Date: 17:05:47 04/16/04
Go up one level in this thread
On April 16, 2004 at 14:32:32, Anthony Cozzie wrote: >On April 16, 2004 at 14:27:55, Gerd Isenberg wrote: > >>On April 16, 2004 at 12:40:21, Robert Hyatt wrote: >><snip> >>>>>>So a possible alternative to evaltable would be hashing in qsearch. >>>>>> >>>>>>Evaltable would be my guess though. Doing a random lookup to a big hashtable at >>>>>>a 400Mhz dual Xeon costs when it is not in the cache around 400ns. >>>>>> >>>>> >>>>> >>>>> >>>>>Depends. Use big memory pages and it costs 150 ns. No TLB thrashing then. >>>> >>>>We all know that when things are in L2 cache it's faster. However when using >>>>hashtables by definition you are busy with TLB trashing. >>> >>>Wrong. >>> >>>Do you know what the TLB does? Do you know how big it is? Do you know what >>>going from 4KB to 2MB/4MB page sizes does to that? >>> >>>Didn't think so... >>> >>>> >>>>Use big hashtables start doing lookups to your hashtable and it costs on average >>>>400 ns on a dual k7 and your dual Xeon. Saying that under conditions A and B, >>>>which hardly happen, that it is 150ns makes no sense. When it's in L2 cache at >>>>opteron it just costs 13 cycles. Same type of comparision. >>> >>>Using 4mb pages, my hash probes do _not_ take 400ns. You can say it all you >>>want, but it will never be true. The proof is intuitive for anyone >>>understanding the relationship between number of virtual pages and TLB size. >>> >> >>Hi Bob, >> >>I guess you are talking about P4/Xeon. >> >>What i read so far about opteron, there is a two level Data-TLB as "part" of the >>L1-Data Cache (1024 - 64 Byte cache lines), which maps the most-recently-used >>virtual addresses to their physical addresses. The primary TLB has 40 entries, >>32 for 4KB pages, only eight for 2MB pages. The secondary contains 512 entries >>for 4KB pages only. De Vries explains the "expensive" table-walk very >>instructive. >> >>Do you believe, with huge random access tables, let say >= 512MB, >>that eight 2MB pages helps to avoid TLB trashing? >> >>Are there special OS-dependent mallocs to get those huge pages? >>What about using one 2M page for some combined CONST or DATA-segments? >>It would be nice to guide the linker that way. >> >>Thanks, >>Gerd >> >> >>some references: >> >>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ >>Processors >>Appendix A Microarchitecture for AMD Athlon™ 64 and AMD Opteron™ Processors >>A.9 Translation-Lookaside Buffer >> >> >>Understanding the detailed Architecture of AMD's 64 bit Core >> by Hans de Vries >> >>http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html >> >>3.3 The Data Cache Hit / Miss Detection: The cache tags and the primairy TLB's >>3.4 The 512 entry second level TLB >>3.16 The TLB Flush Filter CAM > > >Opteron blows in this regard, but I believe that all 64(?) of the P4's TLB >entries can be toggled to large pages. > >Also, it is worth nothing that the opterons current TLB supports only 4MB of >memory using the 4KB pages, so 8 MB is still an improvement :) Hopefully when >AMD transisitions to the 90nm process the new core will fix this. > >anthony Opteron doesn't blow at all. Just test the speed you can get data to you at opteron. It's 2.5 times faster than Xeon in that respect.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.