Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: bugged code

Author: Robert Hyatt
Date: 12:40:58 02/26/03
On February 26, 2003 at 09:05:38, Vincent Diepeveen wrote:

>On February 25, 2003 at 21:27:13, Matt Taylor wrote:
>
>>On February 25, 2003 at 18:53:53, Vincent Diepeveen wrote:
>>
>>>On February 25, 2003 at 13:28:43, Matt Taylor wrote:
>>>
>>>hello are you trying to measure the quantum of a thread in your code?
>>>
>>>how does this measure how fast a thread gets signalled *anyhow* with
>>>WaitForSingleObject?
>><snip>
>>
>>Yes, I am measuring a thread quantum. WaitForSingleObject can potentially switch
>>faster (on the order of nanoseconds possibly, definitely a few microseconds at
>>worst), but that's not the issue -- Crafty isn't using WaitForSingleObject.
>>Crafty is using its own custom spin lock.
>>
>>You asserted that Crafty would spin on the locked data until its timeslice (or
>>quantum) was up. That just isn't true unless someone is dumb enough to run mt=2
>>with 1 processor in their system. I think Crafty even has code to prevent that,
>>but I'm not sure. With 1 thread, it is not possible to block on your own locks
>>unless you have a bug. The spin lock is very fast, faster than making a function
>>call (neglecting the fact that the function is going to do the same thing). This
>>equates to a few nanoseconds.
>
>now you're bragging a lot of nonsense. Bob is saying he wants to do without
>spinning. and that i am kind of idiot to not have optimized for it, where i did
>measure it already for 32 processors in fact as at NUMA systems you cannot waste
>bandwidth at all.

I have never seen any one source of disinformation that comes even close to
comparing
to you.

I _do_ "spin".  I have _always_ "spun".  I will always "spin".  If you can't
grasp the problem
with SMT and spinlocks, then I'm not going to waste any more time trying to
explain it.  The
rest of us understand the problem (and the trivial one-instruction solution)
which is good enough
for me.

>
>Crafty keeps the locks longer than DIEP. It locks for all processors which it is
>searching with for a long period of time. DIEP locks only the processor which is
>busy getting a new job and for a short period of time, then it runs on.

Crafty does not keep the locks very long.  The longest operation is a split, and
I do maybe 2K-3K
of those _total_ in a 3 minute search.  The time can't even be measured.
Profiling shows that the
"Thread()" code executes maybe 3000 times, using 0.0ms per call, which is
nothing.  The other
locks are separate for each split point, and they are only held long enough to
safely extract the
move at the ply the tree was split at for that group of processes.

Total lock time is close enough to zero to call it zero.



>
>The common problem all programs have is what to do at startup when 1 processor
>starts the YBW search, so the others are idling for a very short while. The more
>processors you got the bigger this problem is (as you start the YBW+ search with
>just 500Mhz processor or so).
>
>It is here where bob says you should not spin.

What are you talking about?  I _never_ said "you should not spin".  You just
have to do it
right for it to work well on SMT.


>
>My argument is that it is impossible to do without spinning under linux because
>the latency is too high. Bob didn't even measure it yet of course.

Bob knows more about linux latency than you do.  Your knowledge seems to be
limited
in how to spell the word.  Because there are multiple definitions of latency,
and I use the
one that applies to a dedicated machine with N cpus and N processes.  That is
the case that
counts for chess.






>
>Under windows the WaitForSingleObject is faster than what you got under linux.

No idea what you mean.  I can accurately time waits to 1ms using "select()" so
your
comment makes about as much sense as they usually do.


>
>Try the 'select' for example under LINUX which is the 'default' solution for
>everything under *nix. It is a horrible thing to use. It even does do a malloc()
>or something in the kernel.
>

What?

No need to even respond to that nonsense.




>Under IRIX you can try to use spin_lock() which is trying for 600 tries busywait
>to get a lock and after that idles a process.

common solution in many locks.  Spin for a short time, the block.  Totally
reasonable
as it works for the case where you have more processes than processors, nicely.





>
>For all these systems after those few tries, which of course usually will fail
>as you do not have a job for at least another few bunch of nodes, then a process
>gets put to idle, after which the only way to wake it up is by means of the
>runqueue. This one fires at 100Hz. So minimum latency is 10ms under all the *nix
>systems.


You are mixing spinlock and spinwait.  They are _not_ the same thing.


>
>Your measurements of the quantum of a bunch of threads put at HighPriority is
>therefore complete useless. The alternative for windows is
>WaitForSingleObject().
>
>Note that in DIEP i do without a single system call there. Got too much scared
>after the measurements done.
>



So?  I have _zero_ system calls too...

That was the point of my hand-coded locks.




>>-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.