Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hyper-threading at dual xeon 2.8Ghz

Author: Matt Taylor

Date: 10:28:43 02/25/03

Go up one level in this thread


On February 25, 2003 at 07:44:23, Vincent Diepeveen wrote:

>On February 23, 2003 at 01:38:55, Matt Taylor wrote:
>
>DIEP is spinning and locking way way less than Crafty. Note that
>it is pretty hard to do without spinning under linux.
>
>The runqueue fires at 100Hz in linux. So the latency for a thread that doesn't
>search and normally is doing all kind of stuff is around 10ms under linux.

Yes, Windows NT is 7.5 ms, and any OS that strives to do better is going to
waste a lot of time in the scheduler.

Spin waits are nearly useless on a single-processor machine. I don't know what
you are doing, but a spin wait never occurs in an application on a
single-processor machine when the code is written correctly. Since the chess
engine has no extra threads, there will never be another engine thread that has
the spin lock. The lock will never actually spin -- the thread can always
acquire the lock because it's always free (unless you have a bug).

>For crafty 10ms latency is too much to wait for a thread to get fired for sure.
>
>I guess you didn't try to figure out what the cost of it is, otherwise you would
>not write such unprofessional comments like below.

My comment had nothing to do with Crafty vs. Diep. It had everything to do with
comments you made a few months ago about how the Xeon 2.8 GHz was not available
when Bob had one on his desk. I can understand them not being available in
Europe, but you didn't say that. You kept asserting that they didn't exist.

I'd wager most people who read that thread thought it was pretty funny as I did.

>In DIEP under linux i do not idle either. Of course for me 10ms is too expensive
>too. Instead i generate a bunch of attacktables instead an idle process doesn't
>hammer at the same cache line like crafty does.
>
>It speeds DIEP up 20% (in nodes a second) at 32 processors when i do not take
>the 10ms penalty but go for doing something with the registers without hurting
>shared cache lines (so just local allocated stuff).

Ok, but that's unnecessary. A spin wait is a short-duration lock. Crafty gets
the same speedup without having to go do something else while waiting for the
lock.

>Under windows the runqueue fires at 500Hz, so that's 2ms latency. Still a lot,
>but a lot less than 10ms latency. Today i go test what the effect of that is for
>DIEP. I have no dual Xeon to my avail at the moment to test it though. Must do
>with a dual K7 and dual P3 and see what generating 600 attacktables (about 0.5
>ms at the dual k7) just in local ram is going to give versus using
>WaitForSingleObject.

No. On NT it theoretically fires every 7.5 ms (133 Hz). On Win9x, it can fire as
slow as 20 ms (50 Hz). I measured the time on Windows XP Professional just now
and I got 15 ms. I am inclined to think this is the best XP Professional gets.
Server versions may use different timeslice values, but I don't have a copy to
test with.

Code follows at the end of this message, please cut it when replying. Oh -- and
I recommend -never- programming like that. It's not bad for 20 minutes of work
including some debugging and a fix for SMP, but it can do really nasty things to
your system such as not being able to get into task manager to terminate it...

Too bad Windows's scheduler isn't fair.

>So for processes that let threads idle instead of letting them spin, that is a
>complete pathetic idea for realtime environments.
<snip>

Realtime has nothing to do with it. Spin locks can be used in real-time
programs. The idea behind a spin lock is that it is a -short- wait, probably
shorter than the time required to transition into kernel mode. Spin locks are
used all over SMP kernels, particularly in drivers which are as close to
real-time as the PC architecture usually comes.

In a single processor system, it is a dumb idea as you pointed out, but I don't
think that's news to Bob, and that's not news to me. I haven't even been
programming for 20 years, and he's been doing parallel research for that long.

-Matt

>>>Did you make the necessary changes to spinlocks and spinwaits???
>>
>>Sorry, can't resist a good laugh!
>>
>>"No, they're not out yet!"
>>
>>:-)
>>
>>-Matt

<-- cut here -->
#include <windows.h>
#include <stdio.h>
#include <conio.h>

typedef unsigned __int64 uint64;

DWORD WINAPI IdleThread(LPVOID lpParam);

int main(void)
{
	uint64 clkspeed, dclocks, dbound, freq, dtime;
	SYSTEM_INFO sysInfo;
	HANDLE hThread[2]; // adjust for SMP system
	DWORD dwTID[2];

	QueryPerformanceFrequency((LARGE_INTEGER *) &freq);

	_asm
	{
		lea	eax, dtime
		push	eax
		push	1000
		push	eax
		call	DWORD PTR [QueryPerformanceCounter]
		rdtsc
		mov	esi, eax
		mov	edi, edx
		call	DWORD PTR [Sleep]
		rdtsc
		sub	eax, esi
		sbb	edx, edi
		mov	esi, DWORD PTR [dtime]
		mov	edi, DWORD PTR [dtime+4]
		mov	DWORD PTR [dclocks], eax
		mov	DWORD PTR [dclocks+4], edx
		call	DWORD PTR [QueryPerformanceCounter]
		sub	DWORD PTR [dtime], esi
		sbb	DWORD PTR [dtime+4], edi
	}

	clkspeed = (uint64)((double) dclocks * (double) freq / (double) dtime);
	dbound = clkspeed / 1000;

	SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);

	GetSystemInfo(&sysInfo);
	for(int i = 0; i < sysInfo.dwNumberOfProcessors; i++)
		hThread[i] = CreateThread(NULL, 4096, IdleThread, NULL, 0, &dwTID[i]);

	while(!kbhit())
	{
		_asm
		{
			push	0
			push	0

			call	DWORD PTR [Sleep]

			rdtsc
			mov		DWORD PTR [dclocks], eax
			mov		DWORD PTR [dclocks+4], edx

			call	DWORD PTR [Sleep]

TimeSliceLoop:
			pause
			rdtsc
			sub		eax, DWORD PTR [dbound]
			sbb		edx, DWORD PTR [dbound+4]
			sub		edx, DWORD PTR [dclocks+4]
			ja		TimeSliceElapsed
			sub		eax, DWORD PTR [dclocks]
			jna		TimeSliceLoop

TimeSliceElapsed:
			sbb		edx, 0
			add		eax, DWORD PTR [dbound]
			adc		edx, DWORD PTR [dbound+4]
			mov		DWORD PTR [dclocks], eax
			mov		DWORD PTR [dclocks+4], edx
		}

		printf("Timeslice was: %d msec\n", (int)((dclocks * 1000) / clkspeed));
	}

	for(int i = 0; i < sysInfo.dwNumberOfProcessors; i++)
		TerminateThread(hThread[i], 0);

	return 0;
}

DWORD WINAPI IdleThread(LPVOID lpParam)
{
	SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
	while(1);

	return 0;
}



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.