Author: Robert Hyatt
Date: 21:16:12 01/31/99
Go up one level in this thread
On January 31, 1999 at 23:18:21, Eugene Nalimov wrote:
>On January 31, 1999 at 21:50:03, Robert Hyatt wrote:
>
>>On January 31, 1999 at 20:02:37, Bruce Moreland wrote:
>>
>>>My program is single-processor single-threaded, and runs on Windows NT. I
>>>decided to see what would happen if I added critical section locking to my hash
>>>table.
>>>
>>>I added something to the effect of ...
>>>
>>> EnterCriticalSection(&argcs[hash.key & 1023]);
>>>
>>>... at the start of my hash table probe, a corresponding call to
>>>"LeaveCriticalSection" at the end of this function, another pair around the guts
>>>of my hash table store function, and a third pair around pawn structure
>>>probe/evaluate/store, and my speed dropped by approximately 10%
>>
>>yes, but you are using the horrible mutex-type stuff in windows. My "lock"
>>function turns into a very short piece of code. Here is what I do under linux:
>>
>># define exchange(adr,reg) \
>> ({ volatile int _ret; \
>> asm volatile ("xchgw %0,%1" \
>> : "=q" (_ret), "=m" (*(adr)) /* Output %0,%1 */ \
>> : "m" (*(adr)), "0" (reg)); /* Input (%2),%0 */ \
>> _ret; \
>> })
>>
>>that is a macro that lets me get to the Intel exchange instruction, that is
>>atomic in nature.. that is I get the old value and store a new value in one
>>bus burst operation, to guarantee that only one cpu can get the original 'old'
>>value no matter how many try to do this at the same time. I then use this in
>>a lock like this:
>>
>># define Lock(p) while(exchange(&p,1)) while(p)
>>
>>what this does is to 'atomically' check the lock variable by exchanging it with
>>a "1". If I get a non-zero, it is already set, so the first while is executed.
>>I then go into a tight while loop just loading/testing the lock directly without
>>the bus-locking exchange instruction, so that this loop runs out of cache and
>>has no effect on other threads. But when another CPU stores the 0 value to
>>clear the lock, my cpu's cached copy gets invalidated and the while failes.
>>Back to the original while to try to atomically set the lock to 1 and be sure it
>>is a 0 (which it might not be, since multiple cpus could be spinning here at the
>>same time).
>>
>>The point here is that there are no system calls of any kind, and I can't
>>measure any slowdown at all with one thread running with the lock in, or
>>with the program compiled with the SMP stuff left out...
>
>Bob is right - if there is a chance that blocking thread
>will release a lock soon, it's better to loop and wait.
>There will be no cost of OS call, of thread swithing if
>lock is really busy, etc.
>
>Bob, your code will suffer if there will be other activity
>on the machine, so that blocking thread will be suspended
>while acquiring a lock. The proper solution will be to
>introduce counter in the waiting loop, and if lock will not
>be released in (say) 100 iterations, let the other threads
>run (I don't know how that can be done in OS-independent
I believe windows has such a lock mechanism, doesn't it? But in
any case, my algorithm suffers for other reasons if there are too
many processes running and not enough cpus for them..
I have a "idle loop" in ThreadWait() where an idle thread spins on a
pointer that is zero until another thread sticks something in there. If
I am not on a machine by myself, the operating system may well reach a point
where it has to choose between a process spinning in this idle loop, and one
doing useful work that will get this thread out of the idle loop. And since
it has no idea who is doing what, it can pick the wrong one and kill
performance.
>way). You can reduce chances of such a situation by
>introducing the array of locks (as was done by Bruce in his
>example), and using some bits of your hash key to index it.
>Don't know if that's necessary for 4 threads on the otherwise
>unoccupied machine, but you'd better use that for less ideal
>conditions. Actually, SMP TB probing code use that technique.
>
>Eugene
This isn't an issue yet. I have run some GPSS simulations and found that
for up to 16 threads, the single lock per table is ok. But once I pass 16,
multiple locks are better. I did this simulation a few years ago when looking
at the new cray machines, when I designed my PhD dissertation...
>
>>>This is in the single-threaded app. There would be additional problems if I got
>>>collisions when dealing with multiple processors, but using 1024 critical
>>>sections should mnimize this, I think.
>>>
>>>I had been wondering about this myself, I'm glad Bob asked the question.
>>>
>>>The possibility of a mis-probe would be pretty high without locking, I'd think.
>>>
>>>bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.