Computer Chess Club Archives


Search

Terms

Messages

Subject: Interesting info regarding Intel compiler v8.0

Author: Yen Art Tham

Date: 08:34:19 02/10/04


From: iccOut (iccout2004@yahoo.com)
Subject: sleazy intel compiler trick (SOURCE ATTACHED)
View: Complete Thread (4 articles)
Original Format
Newsgroups: comp.arch
Date: 2004-02-09 14:38:40 PST

As part of my study of Operating Systems and embedded systems, one of
the things I've been looking at is compilers. I'm interested in
analyzing how different compilers optimize code for different
platforms. As part of this comparison, I was looking at the Intel
Compiler and how it optimizes code. The Intel Compilers have a free
evaluation download from here:
http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers


One of the things that the version 8.0 of the Intel compiler included
was an "Intel-specific" flag. According to the documentation, binaries
compiled with this flag would only run on Intel processors and would
include Intel-specific optimizations to make them run faster. The
documentation was unfortunately lacking in explaining what these
optimizations were, so I decided to do some investigating.

First I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !

Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited.  This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?

I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations. If I'm missing something, I'd love for someone to point
it out for me. From the way it looks right now, it appears that Intel
is simply "cheating" to make their processors look better against
competitor's processors.

Links:
Intel
Compiler:http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers


Here is the text:

/*
 * iccOut 1.0
 *
 * This program enables programs compiled with the intel compiler
using the
 * -xN flag to run on non-intel processors. This can sometimes result
in
 * large performance increases, depending on the application. Note
that even
 * though the check will be removed, the CPU running the application
*MUST*
 * support both SSE and SSE2 or the program will crash.
 *
 */

#include <stdio.h>
#include <string.h>


// x86 codes

#define X86_CALL 232  // E8 in hex
#define PUSH_EAX 80   // 50 in hex
#define X86_NOP 144   // 90 in hex

bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary );

//convienently, the check always seems to be one of the first calls in
//the file. this makes it easier to find.
void printUsage() {
 printf("Usage:\n");
 printf("iccOut filename\n\n");
 printf("Filename is the name of the file to fix.\n\n");
}


//returns whether code was replaced
bool processNextCall( FILE* inputBinary, FILE* fixedBinary ) {

 int lenRead;
 int startIndex, bytesNeeded;
 unsigned char addressBuffer[4];
 unsigned char checkBuffer[2];
 unsigned char fullBuffer[7];
 unsigned char tempChar;
 bool codeReplaced;
 bool otherReplaced;

 otherReplaced = false;

 //fixme: error checking for reads
 lenRead = fread( &addressBuffer, 1, 4, inputBinary );
 lenRead = fread( &checkBuffer, 1, 2, inputBinary );

 fullBuffer[0] = X86_CALL;
 for( int i=1; i<5;i++ ) {
  fullBuffer[i] = addressBuffer[i-1];
 }
 fullBuffer[5] = checkBuffer[0];
 fullBuffer[6] = checkBuffer[1];

 codeReplaced = handleCall( fullBuffer, inputBinary, fixedBinary );

 if ( ! codeReplaced ) {

  //if either of the last 2 bytes were a call, we need to keep doing
this
  //until we run out of calls
  while ( ( fullBuffer[5] == X86_CALL ) || ( fullBuffer[6] == X86_CALL
) ) {

   if ( fullBuffer[5] != X86_CALL ) { //write it and ignore it
    tempChar = fullBuffer[5];
    fwrite( &tempChar, 1, 1, fixedBinary );
    fullBuffer[0] = fullBuffer[6];
    bytesNeeded = 6;
    startIndex = 1;
   } else {
    fullBuffer[0] = fullBuffer[5];
    fullBuffer[1] = fullBuffer[6];
    bytesNeeded = 5;
    startIndex = 2;
   }

   for( int i=0; i < bytesNeeded; i++ ) {
    fread( &tempChar, 1, 1, inputBinary );
    fullBuffer[startIndex+i] = tempChar;
   }

   otherReplaced = otherReplaced || handleCall( fullBuffer,
inputBinary, fixedBinary );
  }
 } return ( codeReplaced || otherReplaced );
}

//returns whether code was replaced
bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary ) {

 bool replacedCode;
 unsigned char tempChar;

 replacedCode = false;

 //check if its what we're looking for (one of the first calls
followed by 2 push eax's)
 if ( ( theBuffer[5] == PUSH_EAX ) && ( theBuffer[6] == PUSH_EAX ) ){
  printf("Located call to subroutine to check intel support!\n");
  printf("Substituting code ...\n");

  //replace the call with nops
  replacedCode = true;
  for ( int i=0; i<5;i++ ) {
   theBuffer[i] = X86_NOP;
  }
 }

 if ( replacedCode || ( ( theBuffer[5] != X86_CALL ) && ( theBuffer[6]
!= X86_CALL ) )) {
  //write out the two as they were
  for ( int j=0; j<7;j++ ) {
   tempChar = theBuffer[j];
   fwrite( &tempChar, 1, 1, fixedBinary );
  }
 } else {
   //don't write last 2 bytes
   for( int i=0; i < 5; i++ ) {
    tempChar = theBuffer[i];
    fwrite( &tempChar, 1, 1, fixedBinary );
   }
 } return replacedCode;
}

void fixIntelBinary( char *filename ) {

 FILE *inputBinary;
 FILE *fixedBinary;
 unsigned char theChar;
 bool editedCall;
 bool skipWrite;
 int lenRead;

 printf("iccOut is currently fixing binary: %s\n\n", filename );

 editedCall = false;
 skipWrite = false;

 //open files for reading and writing
 inputBinary = fopen( filename, "rb" );
 fixedBinary = fopen( strcat( filename, ".fixed" ), "wb" );

 if ( ! inputBinary ) {
  printf("Error opening input binary.\n");
  return;
 }

 if ( ! fixedBinary ) {
  printf("Error opening output file.\n");
  return;
 }

 //start reading until we find what we want
 fread( &theChar, 1, 1, inputBinary );
 while (1) {
  if ( !skipWrite ) {
   //write last values
   fwrite( &theChar, 1, 1, fixedBinary );
  }
  skipWrite = false;

  //read next
  lenRead = fread( &theChar, 1, 1, inputBinary );
  if ( lenRead == 0) {  //at end of file
   break;
  }

  if ( ! editedCall ) {
   //check if its the call XXX
   if ( theChar == X86_CALL ) {
    editedCall = processNextCall( inputBinary, fixedBinary );
    skipWrite = true;

   }
  }
 }

 printf("iccOut has saved the day!\n");

 //close files when finished
 fclose( inputBinary );
 fclose( fixedBinary );
}

bool fileExists( char *filename ) {

 FILE *temp;
 bool ret = false;

 temp = fopen( filename, "r" );

 if ( temp != 0 ) {
  ret = true;
  fclose( temp );
 }  return ret;
}

int main( int argc, char **argv ) {

 printf("\nWelcome to iccOut!\n\n");
 printf("This will enable binaries compiled with -xN to run on
non-intel machines\n\n");

 //verify parameters
 if ( argc < 2 ) {
  printUsage();
  return 0;
 }

 //make sure file exists
 if ( ! fileExists( argv[1] ) ) {
  printf("File does not exist or is not accessible: %s\n", argv[1] );
  return 0;
 }

 fixIntelBinary( argv[1] );
 return 0;
}



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.