Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Reducing transposition table latency

Author: Robert Hyatt

Date: 18:48:24 09/28/02

Go up one level in this thread


On September 28, 2002 at 20:59:37, Anthony Cozzie wrote:

>On September 28, 2002 at 12:23:32, Robert Hyatt wrote:
>
>>On September 27, 2002 at 23:29:58, Anthony Cozzie wrote:
>>
>>>Recently, I profiled my chess engine, and one function in particular stood out.
>>>The transposition probe function takes about 7% of the CPU time, or about 350
>>>cycles/call.  All it does is access the transposition table, but the random
>>>nature of the accesses means that it usually misses in the cache AND the TLB,
>>>thus requiring 2 memory accesses at 100+ cycles each.
>>>
>>>In my engine, the search function generates the next move, makes the next move,
>>>checks if it is legal, checks if the opponent is in check, and recurses, so
>>>there are two calls to is_check() between when the transposition key is
>>>available and when the key is used.  I tried inserting a prefetch instruction [I
>>>run an Athlon] with absolutely no effect.  I even tried following the prefetch
>>>with a long loop to make SURE it would have enough time to access the memory,
>>>with no results.  Lastly I tried a MOV instruction, also with no result.  Am I
>>>just doing something wrong here?
>>>
>>>Has anyone else tried to something similar with better results?
>>
>>
>>You are basically stuck in memory-latency land, and there is little you can
>>do.
>>
>>I doubt it is a TLB issue, but then that depends on whether your O/S used
>>4kb or 4mb pages...  But if the TLB is getting crushed, that is more damaging
>>than the cache issue because a memory access takes three accesses, two to map
>>the address, one to fetch it.
>
>I am currently running linux 2.4.18/XFS/Debian.  Apparently linux only supports
>4kb pages in user land [dunno about kernel land].  There is a generally
>recognized demand to add support for bigger pages because systems like oracle
>don't like 4kb pages ;-D  There is a large page patch, but this involves hacking
>your memory into two chunks at boot time, and is a rather less than optimal
>solution. Seeing as how the AthlonXP has 40 TLB entries = 160KB memory TLB
>mapped at any given time and I use a 128 or even 256MB hash table, I think its
>safe to say it has a page fault almost every transposition table access.
>
>Also, I'm a VM master, but why does it require *3* memory accesses? I would
>think 1 to read the page table, 1 to fetch the data.  I really should take CMU's
>OS course . . . .

The map is in two levels.  A one-level map would require a _huge_ chunk of
memory for every possible virtual page.  That is one plus of the 4mb pages,
it reduces the map to one-level with only 1024 pages...





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.