Author: Robert Hyatt
Date: 18:48:24 09/28/02
Go up one level in this thread
On September 28, 2002 at 20:59:37, Anthony Cozzie wrote: >On September 28, 2002 at 12:23:32, Robert Hyatt wrote: > >>On September 27, 2002 at 23:29:58, Anthony Cozzie wrote: >> >>>Recently, I profiled my chess engine, and one function in particular stood out. >>>The transposition probe function takes about 7% of the CPU time, or about 350 >>>cycles/call. All it does is access the transposition table, but the random >>>nature of the accesses means that it usually misses in the cache AND the TLB, >>>thus requiring 2 memory accesses at 100+ cycles each. >>> >>>In my engine, the search function generates the next move, makes the next move, >>>checks if it is legal, checks if the opponent is in check, and recurses, so >>>there are two calls to is_check() between when the transposition key is >>>available and when the key is used. I tried inserting a prefetch instruction [I >>>run an Athlon] with absolutely no effect. I even tried following the prefetch >>>with a long loop to make SURE it would have enough time to access the memory, >>>with no results. Lastly I tried a MOV instruction, also with no result. Am I >>>just doing something wrong here? >>> >>>Has anyone else tried to something similar with better results? >> >> >>You are basically stuck in memory-latency land, and there is little you can >>do. >> >>I doubt it is a TLB issue, but then that depends on whether your O/S used >>4kb or 4mb pages... But if the TLB is getting crushed, that is more damaging >>than the cache issue because a memory access takes three accesses, two to map >>the address, one to fetch it. > >I am currently running linux 2.4.18/XFS/Debian. Apparently linux only supports >4kb pages in user land [dunno about kernel land]. There is a generally >recognized demand to add support for bigger pages because systems like oracle >don't like 4kb pages ;-D There is a large page patch, but this involves hacking >your memory into two chunks at boot time, and is a rather less than optimal >solution. Seeing as how the AthlonXP has 40 TLB entries = 160KB memory TLB >mapped at any given time and I use a 128 or even 256MB hash table, I think its >safe to say it has a page fault almost every transposition table access. > >Also, I'm a VM master, but why does it require *3* memory accesses? I would >think 1 to read the page table, 1 to fetch the data. I really should take CMU's >OS course . . . . The map is in two levels. A one-level map would require a _huge_ chunk of memory for every possible virtual page. That is one plus of the 4mb pages, it reduces the map to one-level with only 1024 pages...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.