Author: Robert Hyatt
Date: 18:50:03 01/31/99
Go up one level in this thread
On January 31, 1999 at 20:02:37, Bruce Moreland wrote:
>My program is single-processor single-threaded, and runs on Windows NT. I
>decided to see what would happen if I added critical section locking to my hash
>table.
>
>I added something to the effect of ...
>
> EnterCriticalSection(&argcs[hash.key & 1023]);
>
>... at the start of my hash table probe, a corresponding call to
>"LeaveCriticalSection" at the end of this function, another pair around the guts
>of my hash table store function, and a third pair around pawn structure
>probe/evaluate/store, and my speed dropped by approximately 10%
yes, but you are using the horrible mutex-type stuff in windows. My "lock"
function turns into a very short piece of code. Here is what I do under linux:
# define exchange(adr,reg) \
({ volatile int _ret; \
asm volatile ("xchgw %0,%1" \
: "=q" (_ret), "=m" (*(adr)) /* Output %0,%1 */ \
: "m" (*(adr)), "0" (reg)); /* Input (%2),%0 */ \
_ret; \
})
that is a macro that lets me get to the Intel exchange instruction, that is
atomic in nature.. that is I get the old value and store a new value in one
bus burst operation, to guarantee that only one cpu can get the original 'old'
value no matter how many try to do this at the same time. I then use this in
a lock like this:
# define Lock(p) while(exchange(&p,1)) while(p)
what this does is to 'atomically' check the lock variable by exchanging it with
a "1". If I get a non-zero, it is already set, so the first while is executed.
I then go into a tight while loop just loading/testing the lock directly without
the bus-locking exchange instruction, so that this loop runs out of cache and
has no effect on other threads. But when another CPU stores the 0 value to
clear the lock, my cpu's cached copy gets invalidated and the while failes.
Back to the original while to try to atomically set the lock to 1 and be sure it
is a 0 (which it might not be, since multiple cpus could be spinning here at the
same time).
The point here is that there are no system calls of any kind, and I can't
measure any slowdown at all with one thread running with the lock in, or
with the program compiled with the SMP stuff left out...
>
>This is in the single-threaded app. There would be additional problems if I got
>collisions when dealing with multiple processors, but using 1024 critical
>sections should mnimize this, I think.
>
>I had been wondering about this myself, I'm glad Bob asked the question.
>
>The possibility of a mis-probe would be pretty high without locking, I'd think.
>
>bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.