Author: Gerd Isenberg
Date: 12:16:19 09/14/04
Hi all,
first time i played a bit more with the x86 rdtsc instruction, to read the time
stamp counter and to measure the number of cycles of a routine.
There are some pitfalls using the time stamp counter, which is incremented each
cycle by hardware. If the test-routine runs the very first time, of course
possible loading of code and data cachelines are considered as well. Therefore
it is required to run the measured code in a loop at least twice.
Otherwise times are interestingly worse and varying.
Here is a test framework for msvc6 +pp with some inline assembly.
rdtsc in conjunction with a leading cpuid(0) instruction versus a dumb loop test
with slight overhead but possible loop overlapping.
I tested a SSE2 routine (not posted) with multiple multi kogge stones (as input
a quad bitboard 256-bit board-representation) attack generation of all sliders
including kings as metaqueen and pinned piece and remove checker detection for
both sides.
Code size of the routine: 2132 bytes.
AMD64 2.2 GHz, iirc P4 (2.4GHz) was not much worse (iirc 10%), in opposite to
some MMX- or gp-routines.
The output:
(I was a bit surprised by the rather
"exact" number of cycles of the loop test)
cycles by rdtsc = 436
cycles by loop = 431
time in ns = 196
I need 128-bit XMM-Alus! (and more and longer pipes ;-)
How long does it take for AMD, to keep their promise
from the AMD64 optimization manual?
Cheers,
Gerd
// the test-framework
#include <stdio.h>
#include <time.h>
unsigned int cycles, cpuidRDTSCcycles;
__forceinline void startRDTSC()
{
__asm
{ // first measure the time of cpuid / rdtsc instructions itself
// use three runs and take the last (according to intel)
xor eax, eax
cpuid
rdtsc
mov [cpuidRDTSCcycles], eax
xor eax, eax
cpuid
rdtsc
sub eax, [cpuidRDTSCcycles]
mov [cpuidRDTSCcycles], eax
xor eax, eax
cpuid
rdtsc
mov [cpuidRDTSCcycles], eax
xor eax, eax
cpuid
rdtsc
sub eax, [cpuidRDTSCcycles]
mov [cpuidRDTSCcycles], eax
xor eax, eax
cpuid
rdtsc
mov [cpuidRDTSCcycles], eax
xor eax, eax
cpuid
rdtsc
sub eax, [cpuidRDTSCcycles]
mov [cpuidRDTSCcycles], eax
// start here
xor eax, eax
cpuid
rdtsc
mov [cycles], eax
}
}
__forceinline void stopRDTSC()
{
__asm
{
xor eax, eax
cpuid
rdtsc
sub eax, [cycles]
sub eax, [cpuidRDTSCcycles]
mov [cycles], eax
}
}
#define MAX_ITERATIONS 100000000 // 10**8
#define MYGHZ (2.2e9)
int main(int argc, char* argv[])
{
clock_t start, stop;
int i;
for ( i = 0; i < 2; i++)
{
startRDTSC();
testRoutine();
stopRDTSC();
}
printf("cycles by rdtsc = %d\n", cycles);
start = clock();
for ( i = 0; i < MAX_ITERATIONS; i++)
testRoutine();
stop = clock();
printf("cycles by loop = %.3f\n", (float)(stop - start) / CLOCKS_PER_SEC *
MYGHZ / MAX_ITERATIONS);
printf("time in ns = %.3f\n", (float)(stop - start) / CLOCKS_PER_SEC *
1e9 / MAX_ITERATIONS);
getchar();
return 0;
}
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.