Computer Chess Club Archives


Search

Terms

Messages

Subject: performance counting

Author: Gerd Isenberg

Date: 12:16:19 09/14/04


Hi all,

first time i played a bit more with the x86 rdtsc instruction, to read the time
stamp counter and to measure the number of cycles of a routine.

There are some pitfalls using the time stamp counter, which is incremented each
cycle by hardware. If the test-routine runs the very first time, of course
possible loading of code and data cachelines are considered as well. Therefore
it is required to run the measured code in a loop at least twice.

Otherwise times are interestingly worse and varying.

Here is a test framework for msvc6 +pp with some inline assembly.
rdtsc in conjunction with a leading cpuid(0) instruction versus a dumb loop test
with slight overhead but possible loop overlapping.


I tested a SSE2 routine (not posted) with multiple multi kogge stones (as input
a quad bitboard 256-bit board-representation) attack generation of all sliders
including kings as metaqueen and pinned piece and remove checker detection for
both sides.

Code size of the routine: 2132 bytes.
AMD64 2.2 GHz, iirc P4 (2.4GHz) was not much worse (iirc 10%), in opposite to
some MMX- or gp-routines.

The output:
(I was a bit surprised by the rather
 "exact" number of cycles of the loop test)

cycles by rdtsc = 436
cycles by loop  = 431
time in ns      = 196

I need 128-bit XMM-Alus! (and more and longer pipes ;-)
How long does it take for AMD, to keep their promise
from the AMD64 optimization manual?

Cheers,
Gerd


// the test-framework
#include <stdio.h>
#include <time.h>

unsigned int cycles, cpuidRDTSCcycles;

__forceinline void startRDTSC()
{
	__asm
	{  // first measure the time of cpuid / rdtsc instructions itself
           //  use three runs and take the last (according to intel)
		xor		eax, eax
		cpuid
		rdtsc
		mov [cpuidRDTSCcycles], eax
		xor		eax, eax
		cpuid
		rdtsc
		sub eax, [cpuidRDTSCcycles]
		mov [cpuidRDTSCcycles], eax

		xor		eax, eax
		cpuid
		rdtsc
		mov [cpuidRDTSCcycles], eax
		xor		eax, eax
		cpuid
		rdtsc
		sub eax, [cpuidRDTSCcycles]
		mov [cpuidRDTSCcycles], eax

		xor		eax, eax
		cpuid
		rdtsc
		mov [cpuidRDTSCcycles], eax
		xor		eax, eax
		cpuid
		rdtsc
		sub eax, [cpuidRDTSCcycles]
		mov [cpuidRDTSCcycles], eax

                // start here
		xor		eax, eax
		cpuid
		rdtsc
		mov [cycles], eax
	}
}


__forceinline void stopRDTSC()
{
	__asm
	{
		xor		eax, eax
		cpuid
		rdtsc
		sub eax, [cycles]
		sub eax, [cpuidRDTSCcycles]
		mov [cycles], eax
	}
}

#define MAX_ITERATIONS 100000000 // 10**8
#define MYGHZ (2.2e9)

int main(int argc, char* argv[])
{
	clock_t start, stop;
	int i;

	for ( i = 0; i < 2; i++)
	{
		startRDTSC();
		testRoutine();
		stopRDTSC();
	}
	printf("cycles by rdtsc = %d\n", cycles);

	start = clock();
	for ( i = 0; i < MAX_ITERATIONS; i++)
		testRoutine();
	stop = clock();

	printf("cycles by loop  = %.3f\n", (float)(stop - start) / CLOCKS_PER_SEC *
MYGHZ / MAX_ITERATIONS);
	printf("time in ns      = %.3f\n", (float)(stop - start) / CLOCKS_PER_SEC *
1e9 / MAX_ITERATIONS);
	getchar();
	return 0;
}




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.