Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hyper-threading at dual xeon 2.8Ghz

Author: Robert Hyatt

Date: 12:30:25 02/26/03

Go up one level in this thread


On February 25, 2003 at 16:23:05, Vincent Diepeveen wrote:

>On February 25, 2003 at 13:28:43, Matt Taylor wrote:
>
>i asked m$ kernel team member and he told me newer NT kernels have 2ms latency
>to wake up a process. If you measure 7.5ms somehow that surprises me quite some.
>I was told it was 2ms for latest NT kernels.


VIncent, you need to read before asking questions.  "2ms to wake up a process"
is a
_far_ different thing than what you suggested earlier.  That is a measure of how
long it
takes to start executing a process once it has been flagged as "ready".  It
includes the time
to suspend the current (presumably lower-priority) process and then
context-switch to the
new process.

In chess, this is meaningless.  Because the processor in question is _idle_ and
that is why I
use spinlocks and spinwaits, as I have _zero_ latency.  Which is what I want...

>
>My own tests show that the scheduler from windows at NT is about 2-3 times
>faster than the latency that it gets under linux. Of course it is possible that
>it ain't 10ms under windows but like 21, i didn't test absolute speeds. i just
>tested relative speeds ;)


Right.  Tested how?  I'm sure it was an _accurate_ test...



>
>>On February 25, 2003 at 07:44:23, Vincent Diepeveen wrote:
>>
>>>On February 23, 2003 at 01:38:55, Matt Taylor wrote:
>>>
>>>DIEP is spinning and locking way way less than Crafty. Note that
>>>it is pretty hard to do without spinning under linux.
>>>
>>>The runqueue fires at 100Hz in linux. So the latency for a thread that doesn't
>>>search and normally is doing all kind of stuff is around 10ms under linux.
>>
>>Yes, Windows NT is 7.5 ms, and any OS that strives to do better is going to
>>waste a lot of time in the scheduler.
>>
>>Spin waits are nearly useless on a single-processor machine. I don't know what
>>you are doing, but a spin wait never occurs in an application on a
>>single-processor machine when the code is written correctly. Since the chess
>>engine has no extra threads, there will never be another engine thread that has
>>the spin lock. The lock will never actually spin -- the thread can always
>>acquire the lock because it's always free (unless you have a bug).
>>
>>>For crafty 10ms latency is too much to wait for a thread to get fired for sure.
>>>
>>>I guess you didn't try to figure out what the cost of it is, otherwise you would
>>>not write such unprofessional comments like below.
>>
>>My comment had nothing to do with Crafty vs. Diep. It had everything to do with
>>comments you made a few months ago about how the Xeon 2.8 GHz was not available
>>when Bob had one on his desk. I can understand them not being available in
>>Europe, but you didn't say that. You kept asserting that they didn't exist.
>>
>>I'd wager most people who read that thread thought it was pretty funny as I did.
>>
>>>In DIEP under linux i do not idle either. Of course for me 10ms is too expensive
>>>too. Instead i generate a bunch of attacktables instead an idle process doesn't
>>>hammer at the same cache line like crafty does.
>>>
>>>It speeds DIEP up 20% (in nodes a second) at 32 processors when i do not take
>>>the 10ms penalty but go for doing something with the registers without hurting
>>>shared cache lines (so just local allocated stuff).
>>
>>Ok, but that's unnecessary. A spin wait is a short-duration lock. Crafty gets
>>the same speedup without having to go do something else while waiting for the
>>lock.
>>
>>>Under windows the runqueue fires at 500Hz, so that's 2ms latency. Still a lot,
>>>but a lot less than 10ms latency. Today i go test what the effect of that is for
>>>DIEP. I have no dual Xeon to my avail at the moment to test it though. Must do
>>>with a dual K7 and dual P3 and see what generating 600 attacktables (about 0.5
>>>ms at the dual k7) just in local ram is going to give versus using
>>>WaitForSingleObject.
>>
>>No. On NT it theoretically fires every 7.5 ms (133 Hz). On Win9x, it can fire as
>>slow as 20 ms (50 Hz). I measured the time on Windows XP Professional just now
>>and I got 15 ms. I am inclined to think this is the best XP Professional gets.
>>Server versions may use different timeslice values, but I don't have a copy to
>>test with.
>
>I do not know whether he meant SERVER version or PROFESSIONAL version for the
>2ms wake up time.
>
>>Code follows at the end of this message, please cut it when replying. Oh -- and
>>I recommend -never- programming like that. It's not bad for 20 minutes of work
>>including some debugging and a fix for SMP, but it can do really nasty things to
>>your system such as not being able to get into task manager to terminate it...
>>
>>Too bad Windows's scheduler isn't fair.
>>
>>>So for processes that let threads idle instead of letting them spin, that is a
>>>complete pathetic idea for realtime environments.
>><snip>
>>
>>Realtime has nothing to do with it. Spin locks can be used in real-time
>>programs. The idea behind a spin lock is that it is a -short- wait, probably
>>shorter than the time required to transition into kernel mode. Spin locks are
>
>Anything that needs kernel functions to let your process search on is bad simply
>nowadays. Kernels really are outdated in some ways.
>
>>used all over SMP kernels, particularly in drivers which are as close to
>>real-time as the PC architecture usually comes.
>>
>>In a single processor system, it is a dumb idea as you pointed out, but I don't
>>think that's news to Bob, and that's not news to me. I haven't even been
>>programming for 20 years, and he's been doing parallel research for that long.
>
>In fact in supercomputers it is far dumber to let stuff idle than in single cpu
>systems. Of course you use up less 'testing cpu clock ticks time'. or whatever
>they call it. But you are slower simply.
>
>20% slower at 32 processors is a lot... ...chessprograms split a lot each
>second.
>
>>-Matt
>>
>>>>>Did you make the necessary changes to spinlocks and spinwaits???
>>>>
>>>>Sorry, can't resist a good laugh!
>>>>
>>>>"No, they're not out yet!"
>>>>
>>>>:-)
>>>>
>>>>-Matt
>>
>><-- cut here -->
>>#include <windows.h>
>>#include <stdio.h>
>>#include <conio.h>
>>
>>typedef unsigned __int64 uint64;
>>
>>DWORD WINAPI IdleThread(LPVOID lpParam);
>>
>>int main(void)
>>{
>>	uint64 clkspeed, dclocks, dbound, freq, dtime;
>>	SYSTEM_INFO sysInfo;
>>	HANDLE hThread[2]; // adjust for SMP system
>>	DWORD dwTID[2];
>>
>>	QueryPerformanceFrequency((LARGE_INTEGER *) &freq);
>>
>>	_asm
>>	{
>>		lea	eax, dtime
>>		push	eax
>>		push	1000
>>		push	eax
>>		call	DWORD PTR [QueryPerformanceCounter]
>>		rdtsc
>>		mov	esi, eax
>>		mov	edi, edx
>>		call	DWORD PTR [Sleep]
>>		rdtsc
>>		sub	eax, esi
>>		sbb	edx, edi
>>		mov	esi, DWORD PTR [dtime]
>>		mov	edi, DWORD PTR [dtime+4]
>>		mov	DWORD PTR [dclocks], eax
>>		mov	DWORD PTR [dclocks+4], edx
>>		call	DWORD PTR [QueryPerformanceCounter]
>>		sub	DWORD PTR [dtime], esi
>>		sbb	DWORD PTR [dtime+4], edi
>>	}
>>
>>	clkspeed = (uint64)((double) dclocks * (double) freq / (double) dtime);
>>	dbound = clkspeed / 1000;
>>
>>	SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
>>
>>	GetSystemInfo(&sysInfo);
>>	for(int i = 0; i < sysInfo.dwNumberOfProcessors; i++)
>>		hThread[i] = CreateThread(NULL, 4096, IdleThread, NULL, 0, &dwTID[i]);
>>
>>	while(!kbhit())
>>	{
>>		_asm
>>		{
>>			push	0
>>			push	0
>>
>>			call	DWORD PTR [Sleep]
>>
>>			rdtsc
>>			mov		DWORD PTR [dclocks], eax
>>			mov		DWORD PTR [dclocks+4], edx
>>
>>			call	DWORD PTR [Sleep]
>>
>>TimeSliceLoop:
>>			pause
>>			rdtsc
>>			sub		eax, DWORD PTR [dbound]
>>			sbb		edx, DWORD PTR [dbound+4]
>>			sub		edx, DWORD PTR [dclocks+4]
>>			ja		TimeSliceElapsed
>>			sub		eax, DWORD PTR [dclocks]
>>			jna		TimeSliceLoop
>>
>>TimeSliceElapsed:
>>			sbb		edx, 0
>>			add		eax, DWORD PTR [dbound]
>>			adc		edx, DWORD PTR [dbound+4]
>>			mov		DWORD PTR [dclocks], eax
>>			mov		DWORD PTR [dclocks+4], edx
>>		}
>>
>>		printf("Timeslice was: %d msec\n", (int)((dclocks * 1000) / clkspeed));
>>	}
>>
>>	for(int i = 0; i < sysInfo.dwNumberOfProcessors; i++)
>>		TerminateThread(hThread[i], 0);
>>
>>	return 0;
>>}
>>
>>DWORD WINAPI IdleThread(LPVOID lpParam)
>>{
>>	SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
>>	while(1);
>>
>>	return 0;
>>}



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.