Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Improving the endgame of my engine

Author: Gerd Isenberg
Date: 03:36:42 04/09/04
On April 08, 2004 at 18:13:40, Ed Schröder wrote:

<snip>
>>>There is an easy way out, just classify them suitable for an "indirect call"
>>>using "switch-case", see:
>>>
>>>http://members.home.nl/matador/chess840.htm#INTRO
>>
>>Hi Ed,
>>
>>Did you intend to link to some other section of your pages?  Perhaps I'm just
>>too tired, but
>>I can't see that the link you provide has any relation whatsoever to the topic
>>under
>>discussion.
>>
>>Tord
>
>I think I misread, I thought one of your worry was all the time-consuming
>compares and jumps to go to the relevant parts of eval depending of the material
>on the board. If so, create a 2-dimensional translation table that converts the
>present material to an unbroken and continuous string of characters
>(0,1,2,3,4,5.....) and then use the result with switch-case.
>
>I have such an endgame table with over 30 entries (KPK, KRK, KBNK, KQKR, KNKPP,
>KBKPPP, KRPKR etc.) and instead of doing 30 expensive compares I just have to do
>2 initialization instructions, get the value of the translation table and then
>the switch-case will move me with just one assembler instruction to the right
>place (label). Imagine the speed gain.
>
>  switch (val) { case  0 : goto KPK;
>                 case  1 : goto KRK;
>                 case  2 : goto KNBK;
>                 case  3 : goto KQKR;
>                 ...................
>                 case 99 : goto whatever;
>               }
>
>Again, the compiler will translate this to just *one* instruction, even if you
>have 200 entries and thus save 200 compares.
>
>Ed

Hi Ed,

yes i like those indirect jumps too, with ms compiler one may use assume(0) in
default case to get rid of a leading bounds check.

With todays super pipelined state of the art processors with very sophisticated
branch prediction heuristics, those indirect jumps are probably most often
difficult or even impossible to predict.

Therefore a leading and easy to predict if-statement, to filter special cases
versus most common case may be advantageous.

Cheers,
Gerd


from

Understanding the detailed Architecture of AMD's 64 bit Core
                    by Hans de Vries

http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html


4.12 Instruction Cache  Hit/Miss detection, The Current Page and BTAC

The basic components for the Instruction Cache hit/miss detection are basically
the same as those for the data cache. See section-3.3:  "The Data Cache Hit /
Miss Detection:  The cache tags and the primairy TLB's"   The single port
Instruction cache only needs a single tag ram and a single TLB. The instruction
cache also has a second level TLB ( see section-3.4) and it has its snoop tag
ram  (section-3.19). All these structures are relatively simple to recognize on
the die-photo.

The current page register holds address bits [47:15] of the "guessed"
Instruction Fetch address. The BTB only stores the lower 15 Instruction Fetch
address bits. The Fetch logic speculates that the next 16 byte instruction line
will be fetched from the same 32 kB page and that the upper address bits [47:15]
remain the same. Jumps and calls that cross the 32 kB border are miss predicted.
The higher bits of the fetch address [47:12] are needed for the cache hit/miss
logic. The virtual page address [47:12] is translated to a physical page address
[39:12] . This page address is then compared to the two physical address tags
read from the two way set associative instruction cache to see if there is a hit
in either way.

The new BTAC ( Branch Target Address Calculator) can recover the full 48 bit
address from the displacement field in the instruction code two cycles after the
code is fetched from the cache. This address can then be compared with the
current page register to check if the assumption that the branch would not cross
the 32 kB bounder was right. The cache hit/miss logic in the mean time has
translated and compared the guessed address with the two instruction cache tags
and produced the hit/miss result.

The processor continues with the 16 instruction bytes fetched from the cache if
there was a cache hit and the 32 kB border was not crossed. The Fetch logic will
re-access the cache if the 32 kB border was crossed and will ignore the hit/miss
result in this case. If the 32 kB border was not crossed and the TLB thus
translated the right fetch address and there was a cache miss then we may
conclude that the cache miss was real and that we have to reload the line from
memory or L2.  The BTAC does not help in case of indirect branches. These still
have to wait until the correct address becomes available from the retired branch
instruction.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.