Computer Chess Club Archives


Search

Terms

Messages

Subject: Source code to measure it

Author: Vincent Diepeveen

Date: 03:24:58 07/15/03

Go up one level in this thread


On July 14, 2003 at 16:07:27, Robert Hyatt wrote:

You measure the latency with those benches of sequential reads.
So already opened cache lines you can get data faster from than
random reads to memory.

Random reads to memory are about 280 ns at single cpu P4 and about 400ns at dual
P4s.

I will now post my source code here to measure it. this works both with
visual c++ as well as at *nix systems.

Compile it and run it for example with a buffer of 500MB and 2 processors:

c:\win2000\> latency 500000000 2


/*-----------------10-6-2003 3:48-------------------*
 *
 * This program rasml.c measures the Random Average Shared Memory Latency
(RASML)
 * Thanks to Agner Fog for his excellent random number generator.
 *
 * This testset is using a heavily optimized and to 64 bits modified version
 * of Agner Fog's ranrot generator.
 *
 * Created by Vincent Diepeveen who is author of this and therefore has
 * the copyright.
 *
 * Nevertheless i encourage persons to use this test UNMODIFIED. It's intention
is
 * to measure the average latency to read and write data to shared memory at all
the
 * processors at the same time.
 *
 * What it does is allocate a big block of memory (gigabytes or
 * terabytes preferably), and then n processes go either read from that
 * memory in a RANDOM way, and another test is reading AND writing
 * at a random way. All the processors perform the same action. They
 * keep the results and write them back to shared memory. Then all the processes
 * except P0 quits. P0 then calculates over all the processors the average
 * and it will show it clearly printed to the screen expressed in nanoseconds.
 *
 * Of course the smallest datasize used in this testset is 64 bits.
 * I wouldn't know how to else access more than 2^32 bytes.
 *
 * There are many things to consider when doing such tests. Like Level1 cache,
Level2 cache.
 * Caches at routers and another big bunch of tricks. The caches i clearly
mention here
 * because a lookup might by accident already have been done before
 * by the same processor or by another processor in the same node that uses the
same RAM.
 *
 * Another influence of the times calculated is caused by the random number
generator.
 *
 * Currently it gets very primitive initialized.
 *
 * There is a big need for this test i feel. In the future more and more
Artificial Intelligence
 * and/or searching software will be there. They all will be busy doing a lot of
random accesses
 * to the RAM.
 *
 * The original reason to create this testset is very sad.

 *       "The paper supports everything"
 *                                                     (Arturo Ochoa at Caracas,
Venezuela)
 *
 * Especially of course when you never actually test the latency. A few quick
searches at the
 * internet already show that paper supports everything with regards to latency.
 *
 * Copyrights: i have extensively searched past year after 'random average
shared memory latencies'.
 * I found nothing that has to do with memory latencies in general even
*approaching* reality where
 * programmers despite all the paper latencies must deal with.
 *
 * Therefore i claim unconditional definition rights at 'random average shared
memory latency' (RASML).
 * In order to measure and publish randon memory latencies, this source code
without written
 * permission by me, may not get modified.
 *
 * In that way i avoid the usual problems that are there in supercomputing
currently
 * where marketing managers use their own definition of the word 'latency'.
 *
 * Currently the word latency by marketing managers is most likely 'the speed
that i imagine
 * my product might be able to achieve at a certain component of a smaller
version of
 * the machine, without taking into account inferior parts of the computer which
 * prevent such fantastic latency numbers in practice'.
 *
 * Vincent Diepeveen                 diep@xs4all.nl
 * Veenendaal, The Netherlands       10 june 2003
 *
 * first a few lines about the random number generator. Note that I modified it
 * very slightly. Basically its initialization has been done better and some
dead
 * slow FPU code.
 */

#define UNIX 0  /* put to 1 when you are under unix or using gcc a look like
compilers */
#define IRIX 0  /* this value only matters when UNIX is set to 1. For Linux put
to 0
                 * basically allocating shared memory in linux is pretty buggy
done in
                 * its kernel.
                 *
                 * Therefore you might want to do 'cat /proc/sys/kernel/shmmax'
                 * and look for yourself how much shared memory YOU can allocate
in linux.
                 *
                 * If that is not enough to benchmark this program then try
modifying it with:
                 *    echo <newsize> > /proc/sys/kernel/shmmmax
                 * Be sure you are root when doing that each time the system
boots.
                 */
#define FREEBSD 1 // be sure to not use more than 2 GB memory with freebsd with
this test. sorry.


#if UNIX
  #include <pthread.h>
  #include <sys/ipc.h>
  #include <sys/shm.h>
  #include <sys/times.h>
  #include <sys/time.h>
  #include <unistd.h>
#else
  #include <windows.h>
  #include <winbase.h> // for GetTickCount()
  #include <process.h> // _spawnl
#endif

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#define SWITCHTIME       300000 /* in milliseconds. Modify this to let a test
run longer.
                                 * basically it is a good idea to use about the
cpu number times
                                 * thousand for this. 30 seconds is fine for
PC's, but a very
                                 * bad idea for supercomputers. I recomment
several minutes
                                 * there. Of course that let's a test take way
way longer.
                                 */
#define MAXPROCESSES     2048   /* this test can go up to this amount of
processes to be tested */
#define CACHELINELENGTH  128    /* cache line length at the machine. Modify this
if you want to */


#if UNIX
  #include <memory.h>
  #define FORCEINLINE       __inline
  /* UNIX and such this is 64 bits unsigned variable: */
  #define BITBOARD                     unsigned long long
#else
  #define FORCEINLINE       __forceinline
  /* in WINDOWS we also want to be 64 bits: */
  #define BITBOARD                     unsigned _int64
#endif

#define     STATUS_NOTSTARTED    0
#define     STATUS_READ          1
#define     STATUS_MEASUREREAD   2
#define     STATUS_MEASUREDREAD  3

#define     STATUS_QUIT         10

struct ProcessState {
  volatile int status; /*  0  = not started yet
                        *  1  = ready to start reading
                        *
                        *  10 = quitted
                        * */

  /* now the numbers each cpu gathers. The name of the first number is what
   * cpu0 is doing and the second name what all the other cpu's were doing at
that
   * time
   */
  volatile BITBOARD readread; /* */
  char dummycacheline[CACHELINELENGTH];
};

typedef struct {
  BITBOARD nentries; // number of entries of 64 bits used for cache.
  struct ProcessState ps[MAXPROCESSES];
} GlobalTree;

void     RanrotAInit(void);
float    ToNano(BITBOARD);
int      GetClock(void);
float    TimeRandom(void);

void     ParseBuffer(BITBOARD);
void     ClearHash(void);
void     DeAllocate(void);
int      DoNrng(BITBOARD);
int      DoNreads(BITBOARD);
int      DoNreadwrites(BITBOARD);
void     TestLatency(float);
int      AllocateTree(void);
void     InitTree(int);
void     WaitForStatus(int,int);
void     PutStatus(int,int);
int      CheckAllStatus(int,int);
void     Slapen(int);
float    LoopRandom(void);



/* define parameters (R1 and R2 must be smaller than the integer size): */
#define KK  17
#define JJ  10
#define R1   5
#define R2   3

/* global variables Ranrot */
BITBOARD randbuffer[KK+3] = { /* history buffer filled with some random numbers
*/

0x92930cb295f24dab,0x0d2f2c860b685215,0x4ef7b8f8e76ccae7,0x03519154af3ec239,0x195e36fe715fad23,

0x86f2729c24a590ad,0x9ff2414a69e4b5ef,0x631205a6bf456141,0x6de386f196bc1b7b,0x5db2d651a7bdf825,

0x0d2f2c86c1de75b7,0x5f72ed908858a9c9,0xfb2629812da87693,0xf3088fedb657f9dd,0x00d47d10ffdc8a9f,

0xd9e323088121da71,0x801600328b823ecb,0x93c300e4885d05f5,0x096d1f3b4e20cd47,0x43d64ed75a9ad5d9

/*0xa05a7755512c0c03,0x960880d9ea857ccd,0x7d9c520a4cc1d30f,0x73b1eb7d8891a8a1,0x116e3fc3a6b7aadb*/
};
int r_p1, r_p2;          /* indexes into history buffer */

/* global variables RASML */
BITBOARD *hashtable,nentries,globaldummy=0;
GlobalTree *tree;
int ProcessNumber;
#if UNIX
int shm_tree,shm_hash;
#endif
char rasmexename[2048];

 /******************************************************** AgF 1999-03-03 *
 *  Random Number generator 'RANROT' type B                               *
 *  by Agner Fog                                                          *
 *                                                                        *
 *  This is a lagged-Fibonacci type of random number generator with       *
 *  rotation of bits.  The algorithm is:                                  *
 *  X[n] = ((X[n-j] rotl r1) + (X[n-k] rotl r2)) modulo 2^b               *
 *                                                                        *
 *  The last k values of X are stored in a circular buffer named          *
 *  randbuffer.                                                           *
 *                                                                        *
 *  This version works with any integer size: 16, 32, 64 bits etc.        *
 *  The integers must be unsigned. The resolution depends on the integer  *
 *  size.                                                                 *
 *                                                                        *
 *  Note that the function RanrotAInit must be called before the first    *
 *  call to RanrotA or iRanrotA                                           *
 *                                                                        *
 *  The theory of the RANROT type of generators is described at           *
 *  www.agner.org/random/ranrot.htm                                       *
 *                                                                        *
 *************************************************************************/

FORCEINLINE BITBOARD rotl(BITBOARD x,int r) {return(x<<r)|(x>>(64-r));}

/* returns a random number of 64 bits unsigned */
FORCEINLINE BITBOARD RanrotA(void) {
  /* generate next random number */
  BITBOARD x = randbuffer[r_p1] = rotl(randbuffer[r_p2],R1) +
rotl(randbuffer[r_p1], R2);
  /* rotate list pointers */
  if( --r_p1 < 0)
    r_p1 = KK - 1;
  if( --r_p2 < 0 )
    r_p2 = KK - 1;
  return x;
}

/* this function initializes the random number generator.      */
void RanrotAInit(void) {
  int i;

  /* one can fill the randbuffer here with possible other values here */

  /* initialize pointers to circular buffer */
  r_p1 = 0;
  r_p2 = JJ;

  /* randomize */
  for( i = 0; i < 300; i++ )
    (void)RanrotA();
}

/* Now the RASML code */
char *To64(BITBOARD x) {
  static char buf[256];
  char *sb;

  sb = &buf[0];
  #if UNIX
  sprintf(buf,"%llu",x);
  #else
  sprintf(buf,"%I64u",x);
  #endif
  return sb;
}

int GetClock(void) {
/* The accuracy is measured in millisecondes. The used function is very accurate
according
 * to the NT team, way more accurate nowadays than mentionned in the MSDN
manual. The accuracy
 * for linux or unix we can only guess. Too many experts there.
 */
  #if UNIX
  struct timeval timeval;
  struct timezone timezone;
  gettimeofday(&timeval, &timezone);
  return((int)(timeval.tv_sec*1000+(timeval.tv_usec/1000)));
  #else
  return((int)GetTickCount());
  #endif
}

float ToNano(BITBOARD nps) {
  /* convert something from times a second to nanoseconds.
   * NOTE THAT THERE IS COMPILER BUGS SOMETIMES AT OLD COMPILERS
   * SO THAT'S WHY MY CODE ISN'T A 1 LINE RETURN HERE. PLEASE DO
   * NOT MODIFY THIS CODE */
  float tn;
  tn = 1000000000/(float)nps;
  return tn;
}

float TimeRandom(void) {
  /* timing the random number generator is very easy of course. Returns
   * number of random numbers a second that can get generated
   */
  BITBOARD bb=0,i,value,nps;
  float ns_rng;
  int t1,t2,took;

  printf("Benchmarking Pseudo Random Number Generator speed, RanRot type
'B'!\n");
  printf("Speed depends upon CPU and compile options from RASML,\n therefore we
benchmark the RNG\n");
  printf("Please wait a few seconds.. "); fflush(stdout);
  value = 100000;
  took  = 0;
  while( took < 3000 ) {
    value <<= 2; //  x4
    t1 = GetClock();

    for( i = 0; i < value; i++ ) {
      bb ^= RanrotA();
    }
    t2 = GetClock();
    took = t2-t1;
  }

  nps = (1000*value)/(BITBOARD)took;

  #if UNIX
  printf("..took %i milliseconds to generate %llu numbers\n",took,value);
  printf("Speed of RNG = %llu numbers a second\n",nps);
  #else
  printf("..took %i milliseconds to generate %I64 numbers\n",took,value);
  printf("Speed of RNG = %I64u numbers a second\n",nps);
  #endif

  ns_rng = ToNano(nps);
  printf("So 1 RNG call takes %f nanoseconds\n",ns_rng);


  return ns_rng;
}

void ParseBuffer(BITBOARD nbytes) {
  tree->nentries = nbytes/sizeof(BITBOARD);
  #if UNIX
  printf("Trying to allocate %llu entries. ",tree->nentries);
  printf("In total %llu bytes\n",tree->nentries*(BITBOARD)sizeof(BITBOARD));
  #else
  printf("Trying to allocate %s entries. ",To64(tree->nentries));
  printf("In total %s bytes\n",To64(tree->nentries*(BITBOARD)sizeof(BITBOARD)));
  #endif
}

void ClearHash(void) {
  BITBOARD i,nentries = tree->nentries;
  /* clearing hashtable */
  printf("Clearing hashtable\n");
  for( i = 0 ; i < nentries ; i++ ) /* very unoptimized way of clearing */
    hashtable[i] = i;
}

void DeAllocate(void) {
  #if UNIX
  shmctl(shm_tree,IPC_RMID,0);
  shmctl(shm_hash,IPC_RMID,0);
  #else
  UnmapViewOfFile(tree);
  UnmapViewOfFile(hashtable);
  #endif
}

int DoNrng(BITBOARD n) {
  BITBOARD i=1,dummyres,nents;
  int t1,t2;

  nents = nentries; /* hopefully this gets into a register */
  dummyres = globaldummy;

  t1 = GetClock();
  do {
    BITBOARD index = RanrotA()%nents;
    dummyres ^= index;
  } while( i++ < n );
  t2 = GetClock();

  globaldummy = dummyres;
  return(t2-t1);
}

int DoNreads(BITBOARD n) {
  BITBOARD i=1,dummyres,nents;
  int t1,t2;

  nents = nentries; /* hopefully this gets into a register */
  dummyres = globaldummy;

  t1 = GetClock();
  do {
    BITBOARD index = RanrotA()%nents;
    dummyres ^= hashtable[index];
  } while( i++ < n );
  t2 = GetClock();

  globaldummy = dummyres;

  return(t2-t1);
}

int DoNreadwrites(BITBOARD n) {
  BITBOARD i=1,dummyres,nents;
  int t1,t2;

  nents = nentries; /* hopefully this gets into a register */
  dummyres = globaldummy;

  t1 = GetClock();
  do {
    BITBOARD index = RanrotA()%nents;
    dummyres ^= hashtable[index];
    hashtable[index] = dummyres;
  } while( i++ < n );
  t2 = GetClock();

  globaldummy = dummyres;

  return(t2-t1);
}

void TestLatency(float ns_rng) {
  BITBOARD n,nps_read,nps_rw,nps_rng;
  float ns,fns;
  int timetaken;

  printf("Doing random RNG test. Please wait..\n");
  n = 50000000; // 50 mln
  timetaken = DoNrng(n);
  nps_rng = (1000*n) / (BITBOARD)timetaken;
  fns  = ToNano(nps_rng);
  printf("Machine needs %f ns for RND loop\n",fns);

  /* READING SINGLE CPU RANDOM ENTRIES */
  printf("Doing random read tests single cpu. Please wait..\n");
  n = 100000000; // 100 mln
  timetaken = DoNreads(n);
  nps_read = (1000*n) / (BITBOARD)timetaken;
  ns  = ToNano(nps_read);
  printf("Machine needs %f ns for single cpu random reads.\nExtrapolated=%f
nanoseconds a read\n",ns,ns-fns);

  /* READING AND THEN WRITING SINGLE CPU RANDOM ENTRIES */
  printf("Doing random readwrite tests single cpu. Please wait..\n");
  n = 100000000; // 100 mln
  timetaken = DoNreadwrites(n);
  nps_rw = (1000*n) / (BITBOARD)timetaken;
  ns  = ToNano(nps_rw);
  printf("Machine needs %f ns for single cpu random readwrites.\n",ns);
  printf("Extrapolated=%f nanoseconds a readwrite (to the same
slot)\n\n",ns-fns);

  printf("So far the useless tests.\nBut we have vague read/write nodes a second
numbers now\n");
}

int AllocateTree(void) { /* initialize the tree. returns 0 if error */
  #if UNIX
  shm_tree = shmget(
              #if IRIX
              ftok(".",'t'),
              #else
              IPC_PRIVATE,
              #endif
              sizeof(GlobalTree),IPC_CREAT|0777);
  if( shm_tree == -1 )
    return 0;
  tree = (GlobalTree *)shmat(shm_tree,0,0);
  if( tree == (GlobalTree *)-1 )
    return 0;
  #else /* so windows NT. This might even work under win98 and such crap OSes,
but not win95 */
  if( !ProcessNumber ) {
    HANDLE TreeFileMap;
    TreeFileMap = CreateFileMapping((HANDLE)0xFFFFFFFF,NULL,PAGE_READWRITE,0,
     (DWORD)sizeof(GlobalTree),"RASM_Tree");
    if( TreeFileMap == NULL )
      return 0;
    tree = (GlobalTree *)MapViewOfFile(TreeFileMap,FILE_MAP_ALL_ACCESS,0,0,0);
    if( tree == NULL )
      return 0;
  }
  else { /* Slaves attach also try to attach to the tree */
    HANDLE TreeFileMap;
    TreeFileMap = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,"RASM_Tree");
    if( TreeFileMap == NULL )
      return 0;
    tree = (GlobalTree *)MapViewOfFile(TreeFileMap,FILE_MAP_ALL_ACCESS,0,0,0);
    if( tree == NULL )
      return 0;
  }
  #endif
  return 1;
}

int AllocateHash(void) { /* initialize the hashtable (cache). returns 0 if error
*/
  #if UNIX
  shm_hash = shmget(
              #if IRIX
              ftok(".",'h'),
              #else
              IPC_PRIVATE,
              #endif
              tree->nentries*8,IPC_CREAT|0777);
  if( shm_hash == -1 )
    return 0;
  hashtable = (BITBOARD *)shmat(shm_hash,0,0);
  if( hashtable == (BITBOARD *)-1 )
    return 0;
  #else /* so windows NT. This might even work under win98 and such crap OSes,
but not win95 */
  if( !ProcessNumber ) {
    HANDLE HashFileMap;
    HashFileMap = CreateFileMapping((HANDLE)0xFFFFFFFF,NULL,PAGE_READWRITE,0,
     (DWORD)tree->nentries*8,"RASM_Hash");
    if( HashFileMap == NULL )
      return 0;
    hashtable = (BITBOARD
*)MapViewOfFile(HashFileMap,FILE_MAP_ALL_ACCESS,0,0,0);
    if( hashtable == NULL )
      return 0;
  }
  else { /* Slaves attach also try to attach to the tree */
    HANDLE HashFileMap;
    HashFileMap = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,"RASM_Hash");
    if( HashFileMap == NULL )
      return 0;
    hashtable = (BITBOARD
*)MapViewOfFile(HashFileMap,FILE_MAP_ALL_ACCESS,0,0,0);
    if( hashtable == NULL )
      return 0;
  }
  #endif
  return 1;
}

int StartProcesses(int ncpus) {
  char buf[256];
  int i;
  /* returns 1 if ncpus-1 started ok */
  if( ncpus == 1 )
    return 1;

  for( i = 1 ; i < ncpus ; i++ ) {
    sprintf(buf,"%i_%i",i+1,ncpus);
    #if UNIX
    if( !fork() )
      execl(rasmexename,rasmexename,buf,NULL);
    #else
    (void)_spawnl(_P_NOWAIT,rasmexename,rasmexename,buf,NULL);
     #endif
  }
  return 1;
}

void InitTree(int ncpus) {
  int i;

  for( i = 0 ; i < ncpus ; i++ ) {
    tree->ps[i].status   = STATUS_NOTSTARTED;
    tree->ps[i].readread = 0;
  }
}

void WaitForStatus(int ncpus,int waitforstate) {
  /* wait for all processors to have the same state */
  int i,badluck=1;

  while( badluck ) {
    badluck = 0;
    for( i = 0 ; i < ncpus ; i++ ) {
      if( tree->ps[i].status != waitforstate )
        badluck = 1;
    }
  }
}

void PutStatus(int ncpus,int statenew) {
  int i;
  for( i = 0 ; i < ncpus ; i++ ) {
    tree->ps[i].status = statenew;
  }
}

int CheckAllStatus(int ncpus,int status) {
  /* Tries with a single loop to determine whether the other cpu's also finished
   *
   * returns:
   *     true  ==> when all the processes have this status
   *     false ==> when 1 or more are still busy measuring
   */
  int i,badluck=1;
  for( i = 0 ; i < ncpus ; i++ ) {
    if( tree->ps[i].status != status ) {
      badluck = 0;
      break;
    }
  }
  return badluck;
}

void Slapen(int ms) {
  #if UNIX
  usleep(ms*1000); /* 0.050 000 secondes, it is in microseconds! */
  #else
  Sleep(ms);     /* 0.050 seconds, it is in milliseconds */
  #endif
}

float LoopRandom(void) {
  BITBOARD n,nps_rng;
  float fns;
  int timetaken;
  printf("Benchmarking random RNG test. Please wait..\n");
  n = 25000000; // 50 mln
  timetaken = 0;
  while( timetaken < 500 ) {
    n += n;
    timetaken = DoNrng(n);
  }
  printf("timetaken=%i\n",timetaken);
  nps_rng = (1000*n) / (BITBOARD)timetaken;
  fns  = ToNano(nps_rng);
  printf("Machine needs %f ns for RND loop\n",fns);
  return fns;
}


/* Example showing how to use the random number generator: */
int main(int argc,char *argv[]) {
  /* allocate a big memory buffer parameter is in bytes.
   * don't hesitate to MODIFY this to how many gigabytes
   * you want to try.
   * The more the better i keep saying to myself.
   *
   * Note that under linux your maximum shared memory limit can be set with:
   *
   * echo <size> > /proc/sys/kernel/shmmax
   *
   * and under IRIX it is usually 80% from the total RAM onboard that can get
allocated
   */

  BITBOARD nbytes,firstguess;
  float ns_rng,f_loop;
  int cpus,tottimes,t1,t2;


  if( argc <= 1 ) {
    printf("Latency test usage is: latency <buffer> <cpus>\n");
    printf("Where 'buffer' is the buffer in number of bytes to allocate\n");
    printf("and where 'cpus' is the number of processes that this test will try
to use (1 = default) \n");
    return 1;
  }

  /* parse the input */
  nbytes = 0;
  cpus   = 1; // default

  if( strchr(argv[1],'_') == NULL ) { /* main startup process */
    int np = 0;
    #if UNIX
     #if FREEBSD
     nbytes = (BITBOARD)atoi(argv[1]); // freebsd doesn't support > 2 GB memory
     #else
     nbytes = (BITBOARD)atoll(argv[1]);
     #endif
    #else
    nbytes = (BITBOARD)_atoi64(argv[1]);
    #endif

    printf("Welcome to RASM Latency!\n");
    printf("RASML measures the RANDOM AVERAGE SHARED MEMORY LATENCY!\n\n");

    if( argc > 2 ) {
      cpus = 0;
      do {
        cpus *= 10;
        cpus += (int)(argv[2][np]-'1')+1;
        np++;
      } while( argv[2][np] >= '0' && argv[2][np] <= '9' );
    }
    //printf("Master: buffer = %s bytes. #CPUs = %i\n",To64(nbytes),cpus);
    ProcessNumber = 0;

    /* check whether we are not getting out of bounds */
    if( cpus > MAXPROCESSES ) {
      printf("Error: Recompile with a bigger stack for MAXPROCESSES. %i
processors is too much\n",cpus);
      return 1;
    }

    /* find out the file name */
    #if UNIX
    strcpy(rasmexename,argv[0]);
    #else
    GetModuleFileName(NULL,rasmexename,2044);
    #endif
    printf("Stored in rasmexename = %s\n",rasmexename);
  }
  else { //   latency 2_452  ==>  means processor 2 out of 452.
    int np = 0;

    ProcessNumber = 0;
    do {
      ProcessNumber *= 10;
      ProcessNumber += (argv[1][np]-'1')+1;      // n
      np++;
    } while( argv[1][np] >= '0' && argv[1][np] <= '9' );

    ProcessNumber--; // 1 less because of ProcessNumber ==> [0..n-1]

    np++; // skip underscore

    cpus = 0;
    do {
      cpus *= 10;
      cpus += (argv[1][np]-'1')+1;      // n
      np++;
    } while( argv[1][np] >= '0' && argv[1][np] <= '9' );
    //printf("Slave: ProcessNumber=%i cpus=%i\n",ProcessNumber,cpus);
  }

  /* first we setup the random number generator. */
  RanrotAInit();

  /* initialize shared memory tree; it gets used for communication between the
processes */
  if( !AllocateTree() ) {
    printf("Error: ProcessNumber %i could not allocate the
tree\n",ProcessNumber);
    return 1;
  }

  if( !ProcessNumber )
    ParseBuffer(nbytes);

  nentries = tree->nentries;

  /* Now some stuff only the Master has to do */
  if( !ProcessNumber ) {
    /* Master: now let's time the pseudo random generators speed in nanoseconds
a call */
    ns_rng = TimeRandom();
    f_loop = LoopRandom();

    printf("Trying to Allocate Buffer\n");
    t1 = GetClock();
    if( !AllocateHash() ) {
      printf("Error: Could not allocate buffer!\n");
      return 1;
    }
    t2 = GetClock();
    printf("Took %i.%03i seconds to allocate Hash\n",(t2-t1)/1000,(t2-t1)%1000);
    ClearHash();
    t1 = GetClock();
    printf("Took %i.%03i seconds to clear Hash\n",(t1-t2)/1000,(t1-t2)%1000);

    /* so now hashtable is setup and we know quite some stuff. So it is time to
     * start all other processes */
    InitTree(cpus);

    printf("Starting Other processes\n");
    t1 = GetClock();
    if( !StartProcesses(cpus) ) {
      printf("Error: Could not start processes\n");
      DeAllocate();
    }
  }
  else { /* all Slaves do this */
    if( !AllocateHash() ) {
      printf("Error: slave %i Could not allocate buffer!\n",ProcessNumber);
      return 1;
    }
  }

  tree->ps[ProcessNumber].status = STATUS_READ;

  if( !ProcessNumber ) {
    WaitForStatus(cpus,STATUS_READ);
    t2 = GetClock();
    printf("Took %i milliseconds to start %i additional
processes\n",t2-t1,cpus-1);
    printf("Read latency measurement STARTS NOW using steps of 2 * %i.%03i
seconds :\n",
     (SWITCHTIME/1000),(SWITCHTIME%1000));
  }

  firstguess = 200000;
  tottimes   = 0;

  for( ;; ) {
    int timetaken = 0;
    if( tree->ps[ProcessNumber].status == STATUS_MEASUREREAD ) {
      /* this really MEASURES the readread */
      BITBOARD ntried = 0,avnumber;
      int totaltime=0;
      while( totaltime < SWITCHTIME ) { /* go measure around switchtime seconds
*/
        totaltime += DoNreads(firstguess);
        ntried += firstguess;
      }
      /* now put the average number of readreads into the shared memory */
      avnumber = (ntried*1000) / (BITBOARD)totaltime;
      tree->ps[ProcessNumber].readread = avnumber;

      /* show that it is finished */
      tree->ps[ProcessNumber].status = STATUS_MEASUREDREAD;

      /* now keep doing the same thing until status gets modified */
      while( tree->ps[ProcessNumber].status == STATUS_MEASUREDREAD ) {
        (void)DoNreads(firstguess);
        if( !ProcessNumber ) {
          if( CheckAllStatus(cpus,STATUS_MEASUREDREAD) ) {
            PutStatus(cpus,STATUS_QUIT);
            break;
          }
        }
      }
    }
    else if( tree->ps[ProcessNumber].status == STATUS_READ ) {
      BITBOARD nextguess;
      /* now software must try to determine how many reads a seconds are
possible for that
       * process
       */
      //printf("proc=%i trying %s reads\n",ProcessNumber,To64(firstguess));
      timetaken = DoNreads(firstguess);
      /* try to guess such that next test takes 1 second, or if test was too
inaccurate
       * then double the number simply. also prevents divide by zero error ;)
       */
      if( timetaken < 400 )
        nextguess = firstguess*2;
      else
        nextguess = (firstguess*1000)/(BITBOARD)timetaken;
      firstguess = nextguess;
      if( !ProcessNumber ) {
        tottimes += timetaken;
        if( tottimes >= SWITCHTIME ) { // 30 seconds to a few minutes
          PutStatus(cpus,STATUS_MEASUREREAD);
          //PutStatus(cpus,STATUS_QUIT);
          tottimes = 0;
        }
      }
    }
    else if( tree->ps[ProcessNumber].status == STATUS_QUIT )
      break;
  }

  /* now do the latency tests
   */
  //TestLatency(ns_rng);
  tree->ps[ProcessNumber].status = STATUS_QUIT;
  if( !ProcessNumber ) {
    BITBOARD averagereadread;
    int i;
    averagereadread = 0;
    WaitForStatus(cpus,STATUS_QUIT);
    for( i = 0; i < cpus ; i++ ) {
      averagereadread += tree->ps[i].readread;
    }
    averagereadread /= (BITBOARD)cpus;
    printf("Raw Average measured read read time at %i processes = %f
ns\n",cpus,ToNano(averagereadread));
    printf("Now for the final calculation it gets compensated:\n");
    printf("  Average measured read read time at %i processes = %f
ns\n",cpus,ToNano(averagereadread)-f_loop);
  }

  DeAllocate();
  return 0;
}

/* EOF latency.c */

































































































































































































































































This page took 0.54 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.