Author: Matt Taylor
Date: 15:44:37 02/13/03
Go up one level in this thread
On February 13, 2003 at 14:59:18, Tom Kerrigan wrote:
>Well, I guess I'm wrong about P6 cmov then. Somebody told me that it was broken
>down into uops that were the equivalent of doing a branch (i.e., one of the uops
>was a branch) because the decision to add cmov to the P6 came late, and the
>ability to do predicated (sorry, conditional) moves in the datapath would have
>been too big a change.
>
>Although so far it sounds like nobody has actually done a test to see if cmov is
>faster, slower, or equal to a branch. You have your timing numbers (apparently
>from a manual) and Bob sort of vaguely recalls running something on a Pentium
>3...
>
>-Tom
Uh, no. The manual says 2 u-ops for reg-reg and 3 u-ops for reg-mem. The figures
I posted were measured on my Pentium 2 350 MHz. My timing numbers were obtained
from the code that follows:
#include <stdio.h>
int main(void)
{
int i, base, lat, thpt;
for(i = 0; i < 16; i++)
{
asm("cpuid\n\t"
"cpuid\n\t"
"cpuid\n\t"
"cpuid\n\t"
"rdtsc\n\t"
"movl %%eax, %%edi\n\t"
"movl %%ecx, %%ecx\n\t"
"movl %%edx, %%edx\n\t"
"cpuid\n\t"
"rdtsc\n\t"
"subl %%edi, %%eax"
: "=a" (base)
:
: "%ebx", "%ecx", "%edx", "%edi");
asm("cpuid\n\t"
"cpuid\n\t"
"cpuid\n\t"
"cpuid\n\t"
"rdtsc\n\t"
"movl %%eax, %%edi\n\t"
"movl %%ecx, %%ecx\n\t"
"movl %%edx, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cmovnz %%edi, %%eax\n\t"
"cmovnz %%edi, %%ebx\n\t"
"cmovnz %%edi, %%ecx\n\t"
"cmovnz %%edi, %%edx\n\t"
"cpuid\n\t"
"rdtsc\n\t"
"subl %%edi, %%eax"
: "=a" (thpt)
:
: "%ebx", "%ecx", "%edx", "%edi");
asm("cpuid\n\t"
"cpuid\n\t"
"cpuid\n\t"
"cpuid\n\t"
"rdtsc\n\t"
"movl %%eax, %%edi\n\t"
"movl %%ecx, %%ecx\n\t"
"movl %%edx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cmovnz %%edx, %%eax\n\t"
"cmovnz %%eax, %%ebx\n\t"
"cmovnz %%ebx, %%ecx\n\t"
"cmovnz %%ecx, %%edx\n\t"
"cpuid\n\t"
"rdtsc\n\t"
"subl %%edi, %%eax"
: "=a" (lat)
:
: "%ebx", "%ecx", "%edx", "%edi");
}
printf("base: %d clocks\n", base);
printf("throughput: %d\n", thpt - base);
printf("latency: %d\n", lat - base);
return 0;
}
Output:
base: 119 clocks
throughput: 62
latency: 65
-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.