Author: Vincent Diepeveen
Date: 13:23:05 02/25/03
Go up one level in this thread
On February 25, 2003 at 13:28:43, Matt Taylor wrote:
i asked m$ kernel team member and he told me newer NT kernels have 2ms latency
to wake up a process. If you measure 7.5ms somehow that surprises me quite some.
I was told it was 2ms for latest NT kernels.
My own tests show that the scheduler from windows at NT is about 2-3 times
faster than the latency that it gets under linux. Of course it is possible that
it ain't 10ms under windows but like 21, i didn't test absolute speeds. i just
tested relative speeds ;)
>On February 25, 2003 at 07:44:23, Vincent Diepeveen wrote:
>
>>On February 23, 2003 at 01:38:55, Matt Taylor wrote:
>>
>>DIEP is spinning and locking way way less than Crafty. Note that
>>it is pretty hard to do without spinning under linux.
>>
>>The runqueue fires at 100Hz in linux. So the latency for a thread that doesn't
>>search and normally is doing all kind of stuff is around 10ms under linux.
>
>Yes, Windows NT is 7.5 ms, and any OS that strives to do better is going to
>waste a lot of time in the scheduler.
>
>Spin waits are nearly useless on a single-processor machine. I don't know what
>you are doing, but a spin wait never occurs in an application on a
>single-processor machine when the code is written correctly. Since the chess
>engine has no extra threads, there will never be another engine thread that has
>the spin lock. The lock will never actually spin -- the thread can always
>acquire the lock because it's always free (unless you have a bug).
>
>>For crafty 10ms latency is too much to wait for a thread to get fired for sure.
>>
>>I guess you didn't try to figure out what the cost of it is, otherwise you would
>>not write such unprofessional comments like below.
>
>My comment had nothing to do with Crafty vs. Diep. It had everything to do with
>comments you made a few months ago about how the Xeon 2.8 GHz was not available
>when Bob had one on his desk. I can understand them not being available in
>Europe, but you didn't say that. You kept asserting that they didn't exist.
>
>I'd wager most people who read that thread thought it was pretty funny as I did.
>
>>In DIEP under linux i do not idle either. Of course for me 10ms is too expensive
>>too. Instead i generate a bunch of attacktables instead an idle process doesn't
>>hammer at the same cache line like crafty does.
>>
>>It speeds DIEP up 20% (in nodes a second) at 32 processors when i do not take
>>the 10ms penalty but go for doing something with the registers without hurting
>>shared cache lines (so just local allocated stuff).
>
>Ok, but that's unnecessary. A spin wait is a short-duration lock. Crafty gets
>the same speedup without having to go do something else while waiting for the
>lock.
>
>>Under windows the runqueue fires at 500Hz, so that's 2ms latency. Still a lot,
>>but a lot less than 10ms latency. Today i go test what the effect of that is for
>>DIEP. I have no dual Xeon to my avail at the moment to test it though. Must do
>>with a dual K7 and dual P3 and see what generating 600 attacktables (about 0.5
>>ms at the dual k7) just in local ram is going to give versus using
>>WaitForSingleObject.
>
>No. On NT it theoretically fires every 7.5 ms (133 Hz). On Win9x, it can fire as
>slow as 20 ms (50 Hz). I measured the time on Windows XP Professional just now
>and I got 15 ms. I am inclined to think this is the best XP Professional gets.
>Server versions may use different timeslice values, but I don't have a copy to
>test with.
I do not know whether he meant SERVER version or PROFESSIONAL version for the
2ms wake up time.
>Code follows at the end of this message, please cut it when replying. Oh -- and
>I recommend -never- programming like that. It's not bad for 20 minutes of work
>including some debugging and a fix for SMP, but it can do really nasty things to
>your system such as not being able to get into task manager to terminate it...
>
>Too bad Windows's scheduler isn't fair.
>
>>So for processes that let threads idle instead of letting them spin, that is a
>>complete pathetic idea for realtime environments.
><snip>
>
>Realtime has nothing to do with it. Spin locks can be used in real-time
>programs. The idea behind a spin lock is that it is a -short- wait, probably
>shorter than the time required to transition into kernel mode. Spin locks are
Anything that needs kernel functions to let your process search on is bad simply
nowadays. Kernels really are outdated in some ways.
>used all over SMP kernels, particularly in drivers which are as close to
>real-time as the PC architecture usually comes.
>
>In a single processor system, it is a dumb idea as you pointed out, but I don't
>think that's news to Bob, and that's not news to me. I haven't even been
>programming for 20 years, and he's been doing parallel research for that long.
In fact in supercomputers it is far dumber to let stuff idle than in single cpu
systems. Of course you use up less 'testing cpu clock ticks time'. or whatever
they call it. But you are slower simply.
20% slower at 32 processors is a lot... ...chessprograms split a lot each
second.
>-Matt
>
>>>>Did you make the necessary changes to spinlocks and spinwaits???
>>>
>>>Sorry, can't resist a good laugh!
>>>
>>>"No, they're not out yet!"
>>>
>>>:-)
>>>
>>>-Matt
>
><-- cut here -->
>#include <windows.h>
>#include <stdio.h>
>#include <conio.h>
>
>typedef unsigned __int64 uint64;
>
>DWORD WINAPI IdleThread(LPVOID lpParam);
>
>int main(void)
>{
> uint64 clkspeed, dclocks, dbound, freq, dtime;
> SYSTEM_INFO sysInfo;
> HANDLE hThread[2]; // adjust for SMP system
> DWORD dwTID[2];
>
> QueryPerformanceFrequency((LARGE_INTEGER *) &freq);
>
> _asm
> {
> lea eax, dtime
> push eax
> push 1000
> push eax
> call DWORD PTR [QueryPerformanceCounter]
> rdtsc
> mov esi, eax
> mov edi, edx
> call DWORD PTR [Sleep]
> rdtsc
> sub eax, esi
> sbb edx, edi
> mov esi, DWORD PTR [dtime]
> mov edi, DWORD PTR [dtime+4]
> mov DWORD PTR [dclocks], eax
> mov DWORD PTR [dclocks+4], edx
> call DWORD PTR [QueryPerformanceCounter]
> sub DWORD PTR [dtime], esi
> sbb DWORD PTR [dtime+4], edi
> }
>
> clkspeed = (uint64)((double) dclocks * (double) freq / (double) dtime);
> dbound = clkspeed / 1000;
>
> SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
>
> GetSystemInfo(&sysInfo);
> for(int i = 0; i < sysInfo.dwNumberOfProcessors; i++)
> hThread[i] = CreateThread(NULL, 4096, IdleThread, NULL, 0, &dwTID[i]);
>
> while(!kbhit())
> {
> _asm
> {
> push 0
> push 0
>
> call DWORD PTR [Sleep]
>
> rdtsc
> mov DWORD PTR [dclocks], eax
> mov DWORD PTR [dclocks+4], edx
>
> call DWORD PTR [Sleep]
>
>TimeSliceLoop:
> pause
> rdtsc
> sub eax, DWORD PTR [dbound]
> sbb edx, DWORD PTR [dbound+4]
> sub edx, DWORD PTR [dclocks+4]
> ja TimeSliceElapsed
> sub eax, DWORD PTR [dclocks]
> jna TimeSliceLoop
>
>TimeSliceElapsed:
> sbb edx, 0
> add eax, DWORD PTR [dbound]
> adc edx, DWORD PTR [dbound+4]
> mov DWORD PTR [dclocks], eax
> mov DWORD PTR [dclocks+4], edx
> }
>
> printf("Timeslice was: %d msec\n", (int)((dclocks * 1000) / clkspeed));
> }
>
> for(int i = 0; i < sysInfo.dwNumberOfProcessors; i++)
> TerminateThread(hThread[i], 0);
>
> return 0;
>}
>
>DWORD WINAPI IdleThread(LPVOID lpParam)
>{
> SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
> while(1);
>
> return 0;
>}
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.