Author: Eugene Nalimov
Date: 20:18:21 01/31/99
Go up one level in this thread
On January 31, 1999 at 21:50:03, Robert Hyatt wrote:
>On January 31, 1999 at 20:02:37, Bruce Moreland wrote:
>
>>My program is single-processor single-threaded, and runs on Windows NT. I
>>decided to see what would happen if I added critical section locking to my hash
>>table.
>>
>>I added something to the effect of ...
>>
>> EnterCriticalSection(&argcs[hash.key & 1023]);
>>
>>... at the start of my hash table probe, a corresponding call to
>>"LeaveCriticalSection" at the end of this function, another pair around the guts
>>of my hash table store function, and a third pair around pawn structure
>>probe/evaluate/store, and my speed dropped by approximately 10%
>
>yes, but you are using the horrible mutex-type stuff in windows. My "lock"
>function turns into a very short piece of code. Here is what I do under linux:
>
># define exchange(adr,reg) \
> ({ volatile int _ret; \
> asm volatile ("xchgw %0,%1" \
> : "=q" (_ret), "=m" (*(adr)) /* Output %0,%1 */ \
> : "m" (*(adr)), "0" (reg)); /* Input (%2),%0 */ \
> _ret; \
> })
>
>that is a macro that lets me get to the Intel exchange instruction, that is
>atomic in nature.. that is I get the old value and store a new value in one
>bus burst operation, to guarantee that only one cpu can get the original 'old'
>value no matter how many try to do this at the same time. I then use this in
>a lock like this:
>
># define Lock(p) while(exchange(&p,1)) while(p)
>
>what this does is to 'atomically' check the lock variable by exchanging it with
>a "1". If I get a non-zero, it is already set, so the first while is executed.
>I then go into a tight while loop just loading/testing the lock directly without
>the bus-locking exchange instruction, so that this loop runs out of cache and
>has no effect on other threads. But when another CPU stores the 0 value to
>clear the lock, my cpu's cached copy gets invalidated and the while failes.
>Back to the original while to try to atomically set the lock to 1 and be sure it
>is a 0 (which it might not be, since multiple cpus could be spinning here at the
>same time).
>
>The point here is that there are no system calls of any kind, and I can't
>measure any slowdown at all with one thread running with the lock in, or
>with the program compiled with the SMP stuff left out...
Bob is right - if there is a chance that blocking thread
will release a lock soon, it's better to loop and wait.
There will be no cost of OS call, of thread swithing if
lock is really busy, etc.
Bob, your code will suffer if there will be other activity
on the machine, so that blocking thread will be suspended
while acquiring a lock. The proper solution will be to
introduce counter in the waiting loop, and if lock will not
be released in (say) 100 iterations, let the other threads
run (I don't know how that can be done in OS-independent
way). You can reduce chances of such a situation by
introducing the array of locks (as was done by Bruce in his
example), and using some bits of your hash key to index it.
Don't know if that's necessary for 4 threads on the otherwise
unoccupied machine, but you'd better use that for less ideal
conditions. Actually, SMP TB probing code use that technique.
Eugene
>>This is in the single-threaded app. There would be additional problems if I got
>>collisions when dealing with multiple processors, but using 1024 critical
>>sections should mnimize this, I think.
>>
>>I had been wondering about this myself, I'm glad Bob asked the question.
>>
>>The possibility of a mis-probe would be pretty high without locking, I'd think.
>>
>>bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.