Author: Gerd Isenberg
Date: 02:38:56 07/31/03
Go up one level in this thread
On July 31, 2003 at 04:04:54, Uri Blass wrote:
>On July 31, 2003 at 02:58:47, Gerd Isenberg wrote:
>
><snipped>
>>Fast cache is a limited resource. Short arrays/tables are in general more cache
>>friendly. If code/data is not in first or second level cache there may be delays
>>from about 140 ns (Memory latency) up to 400ns in worst case, to get data from
>>main memory via cache into the processor.
>
>How can I know if code/data is in the first or second level cache?
>I have no idea about the size of the code in my computer.
The processor knows...
One reason that optimizing for size is often faster than speed optimization.
Keep in mind that "locality" is a performance issue, for code as well for data.
It is preferable to keep often used but time critical subroutines in one C-file
so that they probably located in the same memory page.
>
>I can know the size of the exe file or the size of the source code but it tells
>me no information about the size of specific functions in the computer.
You may generate a map file or an assembler listing with addresses and opcodes.
In map files you find offset and length of public functions.
>
><snipped>
>>>Where do you have 16 4K in this thread?
>>
>>16*4KByte pagesize == 64KByte if you use 16-bits from hashkey, like Bas.
>>I use 12 Bits as index and need a 4KByte table, one page.
>>
>>Gerd
>
>I use until today the simple way of checking all the irreversible moves.
>considering the fact that I have a slow searcher I do not think that it is
>important to change it(I consider tactical positions as more important for
>playing strength than quiet positions when there were a lot of reversible moves
>and I also believe that normal test suites cannot help me to detect the real
>progress so I prefer to keep the simple way)
>
That's perfectly OK.
But it is quite simple to implement this Repetition HashTable and to play around
with it, only a few additional lines.
#define REP_HASH_BITS 12
#define REP_HASH_SIZE (1<<REP_HASH_BITS)
#define REP_HASH_MASK (REP_HASH_SIZE-1)
unsigned char repHash[REP_HASH_SIZE];
// should be initialized with zero at program startup
// and after doing an irreversible move during the game
...
If entering a node you increment:
++repHash[hashkey64 & REP_HASH_MASK];
if leaving a node you decrement:
--repHash[hashkey64 & REP_HASH_MASK];
if looking for repetitions you do:
if ( gameMoveCount50 + 4 <= gameMoveCount )
{
if ( repHash[hashkey64 & REP_HASH_MASK] )
{
// now look for repetition by comparing hashkeys
// of reversable moves with offset - 4,6,8,...
// if you don't find a repetition, a repHash collision occured
....
return DRAW;
}
}
>I was interested in the subject because I thought that it may help me to
>do better design of other arrays or better design of the code.
>
>If I understand correctly you claim that there is no difference between 4 kbytes
>and smaller arrays when you have random-accesses of the array
>but there may be a difference between 4 kbytes and bigger arrays unless the
>access is not random.
No, not "no difference", there is even another memory granulation,
chache lines, 32 or 64 Byte, depending on the processor.
But with one 4K-aligned (0x*000) 4KByte-block it is sure, that this block
requires only one TLB-entry. But my competence is limited in this field and i
have only vague imaginations...
You may read:
AMD Athlon ™
Processor x86 Code Optimization Guide
Appendix A
AMD Athlon™ Processor
Microarchitecture Page 203 and following
eg. on Page 208:
Data Cache
The L1 data cache contains two 64-bit ports. It is a write-allocate
and writeback cache that uses an LRU replacement policy. The data cache and
instruction cache are both two-way set-associative and 64-Kbytes in size. It is
divided into 8 banks where each bank is 8 bytes wide. In addition, this cache
supports the MOESI (Modified, Owner, Exclusive, Shared, and Invalid) cache
coherency protocol and data parity.
The L1 data cache has an associated two-level TLB structure.
The first-level TLB is fully associative and contains 32 entries
(24 that map 4-Kbyte pages and eight that map 2-Mbyte or
4-Mbyte pages). The second-level TLB is four-way set
associative and contains 256 entries, which can map 4-Kbyte
pages.
Gerd
>
>Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.