Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Programming question.

Author: Bruce Moreland

Date: 18:06:08 03/19/99

Go up one level in this thread



On March 19, 1999 at 20:58:05, Robert Hyatt wrote:

>On March 19, 1999 at 16:06:10, Inmann Werner wrote:
>
>>Hello.
>>
>>Today I changed my hashtables from fixed one ta allocated one.
>>
>>Example
>>Old
>>
>>long hindex[65536]
>>signed char hcolor[65536]
>>signed short int value[65536]
>>.....
>>
>>New
>>
>>long *hindex
>>signed char *hcolor
>>signed short int *value
>>....
>>
>>hindex=malloc(4*65536)
>>hcolor=malloc(65536)
>>value=malloc(2*65536)
>>....
>>
>>Works fine but:
>>
>>On my P90, the NPS decreases about 30%  !!!!!
>>On my Cyrix 233 everything is fine (same speed)
>>
>>Do I anything extremly wrong or what happens here. Has this something to do
>>with Prozessor cache or what?
>>
>>Any suggestions??
>>
>>Werner
>>
>>P.S.: Excuse my C, I come from Cobol....
>>I use Watcom C Version 11 Compiler.
>
>
>here is one useful trick:  The PC L2 cache is fetched in blocks of 32 bytes
>from memory in a 'burst'.  You want to make sure that a hash entry doesn't
>'span' two cache lines if you can help it.  Here is a way:
>
>TABLE *hash_table
>
>hash_table=malloc(tablesize+15);
>hash_table=(hash_table+15)&~15;
>
>what that does (and you will probably have to do some casting to void * to make
>the compiler accept some of the above) is do the malloc for the normal size +
>15 bytes, then we add 15 to that pointer and then scrub the low order 4 bits to
>make the pointer align on 16 byte (and also 32 byte) boundaries.  malloc() will
>align on the boundary required by the hardware, which generally tops out at 8
>bytes.  This aligns it better.
>
>Although I am not saying that is your problem...
>
>Your problem is that a single hash table entry is scattered all over memory,
>so that if you have draft, value, move, wtm, signature, you need _five_ cache
>line fills to get one entry.
>
>Put the whole entry into a structure like this:
>
>struct hash_entry {
>  long long signature
>  short value;
>  short move;
>  etc.
>}
>
>then use htable->signature to get to the signature, htable->value to get to
>the value.  Now with everything in adjacent bytes, one memory 'burst' will suck
>in the whole hash entry and each value can be accessed in cache speed, rather
>than memory speed.
>
>it will make a huge difference...

He had essentially the same code before, he just turned three static arrays into
three arrays initialized at runtime.

This made his code slow down by 30% and that's insane.  I can imagine something
slowing down by 30% if it spends all of its time in one loop, but a chess
program is more complicated than this, I think it would be hard to kill yourself
*that* badly with a cache issue.

You can talk about how to make it faster in general, but the issue here isn't
poor performance, the issue is a huge change in performance due to something
that shouldn't have caused perfmance to change hugely.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.