Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hyper-threading at dual xeon 2.8Ghz

Author: Vincent Diepeveen

Date: 06:07:31 02/26/03

Go up one level in this thread


On February 25, 2003 at 21:14:06, Matt Taylor wrote:

>On February 25, 2003 at 16:23:05, Vincent Diepeveen wrote:
>
>>On February 25, 2003 at 13:28:43, Matt Taylor wrote:
>>
>>i asked m$ kernel team member and he told me newer NT kernels have 2ms latency
>>to wake up a process. If you measure 7.5ms somehow that surprises me quite some.
>>I was told it was 2ms for latest NT kernels.
>
>7.5 ms is the documented value. I can't remember whether that was hacked out or
>documented in MSDN. The NT source is available for academic purpose if you're
>willing to sign an NDA.

I never sign for NDA's like that.

>>My own tests show that the scheduler from windows at NT is about 2-3 times
>>faster than the latency that it gets under linux. Of course it is possible that
>>it ain't 10ms under windows but like 21, i didn't test absolute speeds. i just
>>tested relative speeds ;)
>
>Again, Windows XP Professional is 15 ms for me.

You test the wrong thing dude.

>>>On February 25, 2003 at 07:44:23, Vincent Diepeveen wrote:
>>>
>>>>On February 23, 2003 at 01:38:55, Matt Taylor wrote:
>>>>
>>>>DIEP is spinning and locking way way less than Crafty. Note that
>>>>it is pretty hard to do without spinning under linux.
>>>>
>>>>The runqueue fires at 100Hz in linux. So the latency for a thread that doesn't
>>>>search and normally is doing all kind of stuff is around 10ms under linux.
>>>
>>>Yes, Windows NT is 7.5 ms, and any OS that strives to do better is going to
>>>waste a lot of time in the scheduler.
>>>
>>>Spin waits are nearly useless on a single-processor machine. I don't know what
>>>you are doing, but a spin wait never occurs in an application on a
>>>single-processor machine when the code is written correctly. Since the chess
>>>engine has no extra threads, there will never be another engine thread that has
>>>the spin lock. The lock will never actually spin -- the thread can always
>>>acquire the lock because it's always free (unless you have a bug).
>>>
>>>>For crafty 10ms latency is too much to wait for a thread to get fired for sure.
>>>>
>>>>I guess you didn't try to figure out what the cost of it is, otherwise you would
>>>>not write such unprofessional comments like below.
>>>
>>>My comment had nothing to do with Crafty vs. Diep. It had everything to do with
>>>comments you made a few months ago about how the Xeon 2.8 GHz was not available
>>>when Bob had one on his desk. I can understand them not being available in
>>>Europe, but you didn't say that. You kept asserting that they didn't exist.
>>>
>>>I'd wager most people who read that thread thought it was pretty funny as I did.
>>>
>>>>In DIEP under linux i do not idle either. Of course for me 10ms is too expensive
>>>>too. Instead i generate a bunch of attacktables instead an idle process doesn't
>>>>hammer at the same cache line like crafty does.
>>>>
>>>>It speeds DIEP up 20% (in nodes a second) at 32 processors when i do not take
>>>>the 10ms penalty but go for doing something with the registers without hurting
>>>>shared cache lines (so just local allocated stuff).
>>>
>>>Ok, but that's unnecessary. A spin wait is a short-duration lock. Crafty gets
>>>the same speedup without having to go do something else while waiting for the
>>>lock.
>>>
>>>>Under windows the runqueue fires at 500Hz, so that's 2ms latency. Still a lot,
>>>>but a lot less than 10ms latency. Today i go test what the effect of that is for
>>>>DIEP. I have no dual Xeon to my avail at the moment to test it though. Must do
>>>>with a dual K7 and dual P3 and see what generating 600 attacktables (about 0.5
>>>>ms at the dual k7) just in local ram is going to give versus using
>>>>WaitForSingleObject.
>>>
>>>No. On NT it theoretically fires every 7.5 ms (133 Hz). On Win9x, it can fire as
>>>slow as 20 ms (50 Hz). I measured the time on Windows XP Professional just now
>>>and I got 15 ms. I am inclined to think this is the best XP Professional gets.
>>>Server versions may use different timeslice values, but I don't have a copy to
>>>test with.
>>
>>I do not know whether he meant SERVER version or PROFESSIONAL version for the
>>2ms wake up time.
>
>2 ms is pretty short. It is possible he meant 2 ms...but I don't think any PC or
>server uses 2 ms timeslices.
>
>>>Code follows at the end of this message, please cut it when replying. Oh -- and
>>>I recommend -never- programming like that. It's not bad for 20 minutes of work
>>>including some debugging and a fix for SMP, but it can do really nasty things to
>>>your system such as not being able to get into task manager to terminate it...
>>>
>>>Too bad Windows's scheduler isn't fair.
>>>
>>>>So for processes that let threads idle instead of letting them spin, that is a
>>>>complete pathetic idea for realtime environments.
>>><snip>
>>>
>>>Realtime has nothing to do with it. Spin locks can be used in real-time
>>>programs. The idea behind a spin lock is that it is a -short- wait, probably
>>>shorter than the time required to transition into kernel mode. Spin locks are
>>
>>Anything that needs kernel functions to let your process search on is bad simply
>>nowadays. Kernels really are outdated in some ways.
>
>Crafty has its own spin lock code. It does not use the OS for it. This is why
>Bob had to modify Crafty for HT. If the OS provided the spin locks, the OS would
>have to be modified, not Crafty.
>
>>>used all over SMP kernels, particularly in drivers which are as close to
>>>real-time as the PC architecture usually comes.
>>>
>>>In a single processor system, it is a dumb idea as you pointed out, but I don't
>>>think that's news to Bob, and that's not news to me. I haven't even been
>>>programming for 20 years, and he's been doing parallel research for that long.
>>
>>In fact in supercomputers it is far dumber to let stuff idle than in single cpu
>>systems. Of course you use up less 'testing cpu clock ticks time'. or whatever
>>they call it. But you are slower simply.
>>
>>20% slower at 32 processors is a lot... ...chessprograms split a lot each
>>second.
><snip>
>
>A few cycles is a penalty gladly paid. If Bob doesn't pay that in Crafty, the OS
>will pay it for him after overhead on -top- of that. Unless you can avoid race
>conditions, you have to employ some sort of synchronization. The spin lock is
>the best tool for this job because it wastes the least amount of time. It's
>protecting data that isn't locked for very long.
>
>-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.