Author: Dezhi Zhao
Date: 13:30:20 12/20/02
Go up one level in this thread
On December 19, 2002 at 11:40:11, Robert Hyatt wrote:
>For those remembering the discussion from a couple of weeks ago, I
>had run into a strange problem with getting an inline asm lock to
>work. I was playing with this because I was following the Intel
>guideline of adding a "pause" to the "shadow-lock" part of the code.
>
>First, the inline asm code:
>
>void static __inline__ LockX86(volatile int * lock) {
> int dummy;
> asm __volatile__ (
> "1: movl $1, %0" "\n\t"
> " xchgl (%1), %0" "\n\t"
> " testl %0, %0" "\n\t"
> " jz 3f" "\n\t"
> "2: pause" "\n\t"
> " movl (%1), %0" "\n\t"
> " testl %0, %0" "\n\t"
> **** " jnz 2b" "\n\t"
> " jmp 1b" "\n\t"
> "3:" "\n\t"
> : "=&q" (dummy)
> : "q" (lock)
> : "cc");
>}
>
Can we safely delete xchgl (%1), %0 instruction here? There is a similar example
in the Intel spin lock application note. However the Intel example goes without
an xchg instruction.
>First, the lock now works. The bug was on the **** line above. I
>had incorrectly written "jz". To explain the code first, read on...
>
>This is based on the "shadow lock" approach to avoid frying the bus
>when a processor is spinning. The _real_ lock must always be set/tested
>with an atomic-type instruction, and the xchgl (xchange long) instruction
>does this in an indivisable way. Unfortunately, it bypasses a cache hit
>and runs out to memory to actually grab the old value and replace it with a
>new value while the bus is locked.
>
>Since I need to spin on that lock until it is a zero (assuming it is already
>set/held by another thread) looping on the xchngl instruction would _really_
>interfere with the other processors that are doing useful work. In comes the
>shadow lock.
>
>If the xchgl instruction shows that the lock is already non-zero, I jump
>to a loop that tests this value with a movl instruction which loops on the
>value stored in cache. When another processor writes back to the lock
>variable to set it to zero, my cache line for that word gets invalidated and
>we reload it from memory and see the new zero contents. While I am looping
>I don't execute an exchange instruction, just a simple move, which means my
>processor is not accessing the memory bus at all, letting the other processors
>run as fast as possible. When the move finds a zero value, it goes back to the
>exchange instruction to do the test again atomically. If it is still zero, we
>exit the loop, otherwise we hit the shadow spinlock again and spin on cache.
>
>This is important because in the above code, the **** instruction used to be
>"jz" which is wrong. Because it causes an infinite loop. If, when the lock
>code is executed, the lock is zero, the exchange and then test/jz instructions
>will take me out of the lock as we found a zero and it is now set to a 1. But
>if the lock is initially 1, I would hit the shadow spin loop, and I would loop
>if the lock had been cleared, which is bad, or I would jump back to the exchange
>if the lock was still 1, which is wrong also. But the point is the loop would
>hang if, I entered the code with the lock set to 1 already, and someone cleared
>it while I was between the exchange and the shadow lock loop.
>
>And it did hang, but it actually played complete games on ICC without a
>problem, and then it would hang in three consecutive games and lose on time.
>
>Why is that interesting? It suggests that for the most part, when this code
>is called, the lock is zero. Which is what I had thought all along with just
>four threads running. This means that the _locks_ are not really affecting
>my search speed, contrary to what "some" would like to suggest. If I were to
>use 16 processors, I'm sure it would happen more often. But for now, the locks
>do not appear to be a performance bottleneck.
>
>BTW the above code works fine if you are running linux. Or using gcc/gas on
>other platforms. It is not quite microsoft syntax as MS reverses the
>operands to dst, src rather than the ATT approach of src, dst. Also there
>are other syntactical issues dealing with [] vs () and $constant and so forth.
>
>I think I am next going to work on making the other asm all work by inlining
>it to dump a lot of call instruction overhead that is scattered around...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.