Author: Vincent Diepeveen
Date: 03:24:58 07/15/03
Go up one level in this thread
On July 14, 2003 at 16:07:27, Robert Hyatt wrote: You measure the latency with those benches of sequential reads. So already opened cache lines you can get data faster from than random reads to memory. Random reads to memory are about 280 ns at single cpu P4 and about 400ns at dual P4s. I will now post my source code here to measure it. this works both with visual c++ as well as at *nix systems. Compile it and run it for example with a buffer of 500MB and 2 processors: c:\win2000\> latency 500000000 2 /*-----------------10-6-2003 3:48-------------------* * * This program rasml.c measures the Random Average Shared Memory Latency (RASML) * Thanks to Agner Fog for his excellent random number generator. * * This testset is using a heavily optimized and to 64 bits modified version * of Agner Fog's ranrot generator. * * Created by Vincent Diepeveen who is author of this and therefore has * the copyright. * * Nevertheless i encourage persons to use this test UNMODIFIED. It's intention is * to measure the average latency to read and write data to shared memory at all the * processors at the same time. * * What it does is allocate a big block of memory (gigabytes or * terabytes preferably), and then n processes go either read from that * memory in a RANDOM way, and another test is reading AND writing * at a random way. All the processors perform the same action. They * keep the results and write them back to shared memory. Then all the processes * except P0 quits. P0 then calculates over all the processors the average * and it will show it clearly printed to the screen expressed in nanoseconds. * * Of course the smallest datasize used in this testset is 64 bits. * I wouldn't know how to else access more than 2^32 bytes. * * There are many things to consider when doing such tests. Like Level1 cache, Level2 cache. * Caches at routers and another big bunch of tricks. The caches i clearly mention here * because a lookup might by accident already have been done before * by the same processor or by another processor in the same node that uses the same RAM. * * Another influence of the times calculated is caused by the random number generator. * * Currently it gets very primitive initialized. * * There is a big need for this test i feel. In the future more and more Artificial Intelligence * and/or searching software will be there. They all will be busy doing a lot of random accesses * to the RAM. * * The original reason to create this testset is very sad. * "The paper supports everything" * (Arturo Ochoa at Caracas, Venezuela) * * Especially of course when you never actually test the latency. A few quick searches at the * internet already show that paper supports everything with regards to latency. * * Copyrights: i have extensively searched past year after 'random average shared memory latencies'. * I found nothing that has to do with memory latencies in general even *approaching* reality where * programmers despite all the paper latencies must deal with. * * Therefore i claim unconditional definition rights at 'random average shared memory latency' (RASML). * In order to measure and publish randon memory latencies, this source code without written * permission by me, may not get modified. * * In that way i avoid the usual problems that are there in supercomputing currently * where marketing managers use their own definition of the word 'latency'. * * Currently the word latency by marketing managers is most likely 'the speed that i imagine * my product might be able to achieve at a certain component of a smaller version of * the machine, without taking into account inferior parts of the computer which * prevent such fantastic latency numbers in practice'. * * Vincent Diepeveen diep@xs4all.nl * Veenendaal, The Netherlands 10 june 2003 * * first a few lines about the random number generator. Note that I modified it * very slightly. Basically its initialization has been done better and some dead * slow FPU code. */ #define UNIX 0 /* put to 1 when you are under unix or using gcc a look like compilers */ #define IRIX 0 /* this value only matters when UNIX is set to 1. For Linux put to 0 * basically allocating shared memory in linux is pretty buggy done in * its kernel. * * Therefore you might want to do 'cat /proc/sys/kernel/shmmax' * and look for yourself how much shared memory YOU can allocate in linux. * * If that is not enough to benchmark this program then try modifying it with: * echo <newsize> > /proc/sys/kernel/shmmmax * Be sure you are root when doing that each time the system boots. */ #define FREEBSD 1 // be sure to not use more than 2 GB memory with freebsd with this test. sorry. #if UNIX #include <pthread.h> #include <sys/ipc.h> #include <sys/shm.h> #include <sys/times.h> #include <sys/time.h> #include <unistd.h> #else #include <windows.h> #include <winbase.h> // for GetTickCount() #include <process.h> // _spawnl #endif #include <stdio.h> #include <string.h> #include <stdlib.h> #include <math.h> #include <time.h> #define SWITCHTIME 300000 /* in milliseconds. Modify this to let a test run longer. * basically it is a good idea to use about the cpu number times * thousand for this. 30 seconds is fine for PC's, but a very * bad idea for supercomputers. I recomment several minutes * there. Of course that let's a test take way way longer. */ #define MAXPROCESSES 2048 /* this test can go up to this amount of processes to be tested */ #define CACHELINELENGTH 128 /* cache line length at the machine. Modify this if you want to */ #if UNIX #include <memory.h> #define FORCEINLINE __inline /* UNIX and such this is 64 bits unsigned variable: */ #define BITBOARD unsigned long long #else #define FORCEINLINE __forceinline /* in WINDOWS we also want to be 64 bits: */ #define BITBOARD unsigned _int64 #endif #define STATUS_NOTSTARTED 0 #define STATUS_READ 1 #define STATUS_MEASUREREAD 2 #define STATUS_MEASUREDREAD 3 #define STATUS_QUIT 10 struct ProcessState { volatile int status; /* 0 = not started yet * 1 = ready to start reading * * 10 = quitted * */ /* now the numbers each cpu gathers. The name of the first number is what * cpu0 is doing and the second name what all the other cpu's were doing at that * time */ volatile BITBOARD readread; /* */ char dummycacheline[CACHELINELENGTH]; }; typedef struct { BITBOARD nentries; // number of entries of 64 bits used for cache. struct ProcessState ps[MAXPROCESSES]; } GlobalTree; void RanrotAInit(void); float ToNano(BITBOARD); int GetClock(void); float TimeRandom(void); void ParseBuffer(BITBOARD); void ClearHash(void); void DeAllocate(void); int DoNrng(BITBOARD); int DoNreads(BITBOARD); int DoNreadwrites(BITBOARD); void TestLatency(float); int AllocateTree(void); void InitTree(int); void WaitForStatus(int,int); void PutStatus(int,int); int CheckAllStatus(int,int); void Slapen(int); float LoopRandom(void); /* define parameters (R1 and R2 must be smaller than the integer size): */ #define KK 17 #define JJ 10 #define R1 5 #define R2 3 /* global variables Ranrot */ BITBOARD randbuffer[KK+3] = { /* history buffer filled with some random numbers */ 0x92930cb295f24dab,0x0d2f2c860b685215,0x4ef7b8f8e76ccae7,0x03519154af3ec239,0x195e36fe715fad23, 0x86f2729c24a590ad,0x9ff2414a69e4b5ef,0x631205a6bf456141,0x6de386f196bc1b7b,0x5db2d651a7bdf825, 0x0d2f2c86c1de75b7,0x5f72ed908858a9c9,0xfb2629812da87693,0xf3088fedb657f9dd,0x00d47d10ffdc8a9f, 0xd9e323088121da71,0x801600328b823ecb,0x93c300e4885d05f5,0x096d1f3b4e20cd47,0x43d64ed75a9ad5d9 /*0xa05a7755512c0c03,0x960880d9ea857ccd,0x7d9c520a4cc1d30f,0x73b1eb7d8891a8a1,0x116e3fc3a6b7aadb*/ }; int r_p1, r_p2; /* indexes into history buffer */ /* global variables RASML */ BITBOARD *hashtable,nentries,globaldummy=0; GlobalTree *tree; int ProcessNumber; #if UNIX int shm_tree,shm_hash; #endif char rasmexename[2048]; /******************************************************** AgF 1999-03-03 * * Random Number generator 'RANROT' type B * * by Agner Fog * * * * This is a lagged-Fibonacci type of random number generator with * * rotation of bits. The algorithm is: * * X[n] = ((X[n-j] rotl r1) + (X[n-k] rotl r2)) modulo 2^b * * * * The last k values of X are stored in a circular buffer named * * randbuffer. * * * * This version works with any integer size: 16, 32, 64 bits etc. * * The integers must be unsigned. The resolution depends on the integer * * size. * * * * Note that the function RanrotAInit must be called before the first * * call to RanrotA or iRanrotA * * * * The theory of the RANROT type of generators is described at * * www.agner.org/random/ranrot.htm * * * *************************************************************************/ FORCEINLINE BITBOARD rotl(BITBOARD x,int r) {return(x<<r)|(x>>(64-r));} /* returns a random number of 64 bits unsigned */ FORCEINLINE BITBOARD RanrotA(void) { /* generate next random number */ BITBOARD x = randbuffer[r_p1] = rotl(randbuffer[r_p2],R1) + rotl(randbuffer[r_p1], R2); /* rotate list pointers */ if( --r_p1 < 0) r_p1 = KK - 1; if( --r_p2 < 0 ) r_p2 = KK - 1; return x; } /* this function initializes the random number generator. */ void RanrotAInit(void) { int i; /* one can fill the randbuffer here with possible other values here */ /* initialize pointers to circular buffer */ r_p1 = 0; r_p2 = JJ; /* randomize */ for( i = 0; i < 300; i++ ) (void)RanrotA(); } /* Now the RASML code */ char *To64(BITBOARD x) { static char buf[256]; char *sb; sb = &buf[0]; #if UNIX sprintf(buf,"%llu",x); #else sprintf(buf,"%I64u",x); #endif return sb; } int GetClock(void) { /* The accuracy is measured in millisecondes. The used function is very accurate according * to the NT team, way more accurate nowadays than mentionned in the MSDN manual. The accuracy * for linux or unix we can only guess. Too many experts there. */ #if UNIX struct timeval timeval; struct timezone timezone; gettimeofday(&timeval, &timezone); return((int)(timeval.tv_sec*1000+(timeval.tv_usec/1000))); #else return((int)GetTickCount()); #endif } float ToNano(BITBOARD nps) { /* convert something from times a second to nanoseconds. * NOTE THAT THERE IS COMPILER BUGS SOMETIMES AT OLD COMPILERS * SO THAT'S WHY MY CODE ISN'T A 1 LINE RETURN HERE. PLEASE DO * NOT MODIFY THIS CODE */ float tn; tn = 1000000000/(float)nps; return tn; } float TimeRandom(void) { /* timing the random number generator is very easy of course. Returns * number of random numbers a second that can get generated */ BITBOARD bb=0,i,value,nps; float ns_rng; int t1,t2,took; printf("Benchmarking Pseudo Random Number Generator speed, RanRot type 'B'!\n"); printf("Speed depends upon CPU and compile options from RASML,\n therefore we benchmark the RNG\n"); printf("Please wait a few seconds.. "); fflush(stdout); value = 100000; took = 0; while( took < 3000 ) { value <<= 2; // x4 t1 = GetClock(); for( i = 0; i < value; i++ ) { bb ^= RanrotA(); } t2 = GetClock(); took = t2-t1; } nps = (1000*value)/(BITBOARD)took; #if UNIX printf("..took %i milliseconds to generate %llu numbers\n",took,value); printf("Speed of RNG = %llu numbers a second\n",nps); #else printf("..took %i milliseconds to generate %I64 numbers\n",took,value); printf("Speed of RNG = %I64u numbers a second\n",nps); #endif ns_rng = ToNano(nps); printf("So 1 RNG call takes %f nanoseconds\n",ns_rng); return ns_rng; } void ParseBuffer(BITBOARD nbytes) { tree->nentries = nbytes/sizeof(BITBOARD); #if UNIX printf("Trying to allocate %llu entries. ",tree->nentries); printf("In total %llu bytes\n",tree->nentries*(BITBOARD)sizeof(BITBOARD)); #else printf("Trying to allocate %s entries. ",To64(tree->nentries)); printf("In total %s bytes\n",To64(tree->nentries*(BITBOARD)sizeof(BITBOARD))); #endif } void ClearHash(void) { BITBOARD i,nentries = tree->nentries; /* clearing hashtable */ printf("Clearing hashtable\n"); for( i = 0 ; i < nentries ; i++ ) /* very unoptimized way of clearing */ hashtable[i] = i; } void DeAllocate(void) { #if UNIX shmctl(shm_tree,IPC_RMID,0); shmctl(shm_hash,IPC_RMID,0); #else UnmapViewOfFile(tree); UnmapViewOfFile(hashtable); #endif } int DoNrng(BITBOARD n) { BITBOARD i=1,dummyres,nents; int t1,t2; nents = nentries; /* hopefully this gets into a register */ dummyres = globaldummy; t1 = GetClock(); do { BITBOARD index = RanrotA()%nents; dummyres ^= index; } while( i++ < n ); t2 = GetClock(); globaldummy = dummyres; return(t2-t1); } int DoNreads(BITBOARD n) { BITBOARD i=1,dummyres,nents; int t1,t2; nents = nentries; /* hopefully this gets into a register */ dummyres = globaldummy; t1 = GetClock(); do { BITBOARD index = RanrotA()%nents; dummyres ^= hashtable[index]; } while( i++ < n ); t2 = GetClock(); globaldummy = dummyres; return(t2-t1); } int DoNreadwrites(BITBOARD n) { BITBOARD i=1,dummyres,nents; int t1,t2; nents = nentries; /* hopefully this gets into a register */ dummyres = globaldummy; t1 = GetClock(); do { BITBOARD index = RanrotA()%nents; dummyres ^= hashtable[index]; hashtable[index] = dummyres; } while( i++ < n ); t2 = GetClock(); globaldummy = dummyres; return(t2-t1); } void TestLatency(float ns_rng) { BITBOARD n,nps_read,nps_rw,nps_rng; float ns,fns; int timetaken; printf("Doing random RNG test. Please wait..\n"); n = 50000000; // 50 mln timetaken = DoNrng(n); nps_rng = (1000*n) / (BITBOARD)timetaken; fns = ToNano(nps_rng); printf("Machine needs %f ns for RND loop\n",fns); /* READING SINGLE CPU RANDOM ENTRIES */ printf("Doing random read tests single cpu. Please wait..\n"); n = 100000000; // 100 mln timetaken = DoNreads(n); nps_read = (1000*n) / (BITBOARD)timetaken; ns = ToNano(nps_read); printf("Machine needs %f ns for single cpu random reads.\nExtrapolated=%f nanoseconds a read\n",ns,ns-fns); /* READING AND THEN WRITING SINGLE CPU RANDOM ENTRIES */ printf("Doing random readwrite tests single cpu. Please wait..\n"); n = 100000000; // 100 mln timetaken = DoNreadwrites(n); nps_rw = (1000*n) / (BITBOARD)timetaken; ns = ToNano(nps_rw); printf("Machine needs %f ns for single cpu random readwrites.\n",ns); printf("Extrapolated=%f nanoseconds a readwrite (to the same slot)\n\n",ns-fns); printf("So far the useless tests.\nBut we have vague read/write nodes a second numbers now\n"); } int AllocateTree(void) { /* initialize the tree. returns 0 if error */ #if UNIX shm_tree = shmget( #if IRIX ftok(".",'t'), #else IPC_PRIVATE, #endif sizeof(GlobalTree),IPC_CREAT|0777); if( shm_tree == -1 ) return 0; tree = (GlobalTree *)shmat(shm_tree,0,0); if( tree == (GlobalTree *)-1 ) return 0; #else /* so windows NT. This might even work under win98 and such crap OSes, but not win95 */ if( !ProcessNumber ) { HANDLE TreeFileMap; TreeFileMap = CreateFileMapping((HANDLE)0xFFFFFFFF,NULL,PAGE_READWRITE,0, (DWORD)sizeof(GlobalTree),"RASM_Tree"); if( TreeFileMap == NULL ) return 0; tree = (GlobalTree *)MapViewOfFile(TreeFileMap,FILE_MAP_ALL_ACCESS,0,0,0); if( tree == NULL ) return 0; } else { /* Slaves attach also try to attach to the tree */ HANDLE TreeFileMap; TreeFileMap = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,"RASM_Tree"); if( TreeFileMap == NULL ) return 0; tree = (GlobalTree *)MapViewOfFile(TreeFileMap,FILE_MAP_ALL_ACCESS,0,0,0); if( tree == NULL ) return 0; } #endif return 1; } int AllocateHash(void) { /* initialize the hashtable (cache). returns 0 if error */ #if UNIX shm_hash = shmget( #if IRIX ftok(".",'h'), #else IPC_PRIVATE, #endif tree->nentries*8,IPC_CREAT|0777); if( shm_hash == -1 ) return 0; hashtable = (BITBOARD *)shmat(shm_hash,0,0); if( hashtable == (BITBOARD *)-1 ) return 0; #else /* so windows NT. This might even work under win98 and such crap OSes, but not win95 */ if( !ProcessNumber ) { HANDLE HashFileMap; HashFileMap = CreateFileMapping((HANDLE)0xFFFFFFFF,NULL,PAGE_READWRITE,0, (DWORD)tree->nentries*8,"RASM_Hash"); if( HashFileMap == NULL ) return 0; hashtable = (BITBOARD *)MapViewOfFile(HashFileMap,FILE_MAP_ALL_ACCESS,0,0,0); if( hashtable == NULL ) return 0; } else { /* Slaves attach also try to attach to the tree */ HANDLE HashFileMap; HashFileMap = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,"RASM_Hash"); if( HashFileMap == NULL ) return 0; hashtable = (BITBOARD *)MapViewOfFile(HashFileMap,FILE_MAP_ALL_ACCESS,0,0,0); if( hashtable == NULL ) return 0; } #endif return 1; } int StartProcesses(int ncpus) { char buf[256]; int i; /* returns 1 if ncpus-1 started ok */ if( ncpus == 1 ) return 1; for( i = 1 ; i < ncpus ; i++ ) { sprintf(buf,"%i_%i",i+1,ncpus); #if UNIX if( !fork() ) execl(rasmexename,rasmexename,buf,NULL); #else (void)_spawnl(_P_NOWAIT,rasmexename,rasmexename,buf,NULL); #endif } return 1; } void InitTree(int ncpus) { int i; for( i = 0 ; i < ncpus ; i++ ) { tree->ps[i].status = STATUS_NOTSTARTED; tree->ps[i].readread = 0; } } void WaitForStatus(int ncpus,int waitforstate) { /* wait for all processors to have the same state */ int i,badluck=1; while( badluck ) { badluck = 0; for( i = 0 ; i < ncpus ; i++ ) { if( tree->ps[i].status != waitforstate ) badluck = 1; } } } void PutStatus(int ncpus,int statenew) { int i; for( i = 0 ; i < ncpus ; i++ ) { tree->ps[i].status = statenew; } } int CheckAllStatus(int ncpus,int status) { /* Tries with a single loop to determine whether the other cpu's also finished * * returns: * true ==> when all the processes have this status * false ==> when 1 or more are still busy measuring */ int i,badluck=1; for( i = 0 ; i < ncpus ; i++ ) { if( tree->ps[i].status != status ) { badluck = 0; break; } } return badluck; } void Slapen(int ms) { #if UNIX usleep(ms*1000); /* 0.050 000 secondes, it is in microseconds! */ #else Sleep(ms); /* 0.050 seconds, it is in milliseconds */ #endif } float LoopRandom(void) { BITBOARD n,nps_rng; float fns; int timetaken; printf("Benchmarking random RNG test. Please wait..\n"); n = 25000000; // 50 mln timetaken = 0; while( timetaken < 500 ) { n += n; timetaken = DoNrng(n); } printf("timetaken=%i\n",timetaken); nps_rng = (1000*n) / (BITBOARD)timetaken; fns = ToNano(nps_rng); printf("Machine needs %f ns for RND loop\n",fns); return fns; } /* Example showing how to use the random number generator: */ int main(int argc,char *argv[]) { /* allocate a big memory buffer parameter is in bytes. * don't hesitate to MODIFY this to how many gigabytes * you want to try. * The more the better i keep saying to myself. * * Note that under linux your maximum shared memory limit can be set with: * * echo <size> > /proc/sys/kernel/shmmax * * and under IRIX it is usually 80% from the total RAM onboard that can get allocated */ BITBOARD nbytes,firstguess; float ns_rng,f_loop; int cpus,tottimes,t1,t2; if( argc <= 1 ) { printf("Latency test usage is: latency <buffer> <cpus>\n"); printf("Where 'buffer' is the buffer in number of bytes to allocate\n"); printf("and where 'cpus' is the number of processes that this test will try to use (1 = default) \n"); return 1; } /* parse the input */ nbytes = 0; cpus = 1; // default if( strchr(argv[1],'_') == NULL ) { /* main startup process */ int np = 0; #if UNIX #if FREEBSD nbytes = (BITBOARD)atoi(argv[1]); // freebsd doesn't support > 2 GB memory #else nbytes = (BITBOARD)atoll(argv[1]); #endif #else nbytes = (BITBOARD)_atoi64(argv[1]); #endif printf("Welcome to RASM Latency!\n"); printf("RASML measures the RANDOM AVERAGE SHARED MEMORY LATENCY!\n\n"); if( argc > 2 ) { cpus = 0; do { cpus *= 10; cpus += (int)(argv[2][np]-'1')+1; np++; } while( argv[2][np] >= '0' && argv[2][np] <= '9' ); } //printf("Master: buffer = %s bytes. #CPUs = %i\n",To64(nbytes),cpus); ProcessNumber = 0; /* check whether we are not getting out of bounds */ if( cpus > MAXPROCESSES ) { printf("Error: Recompile with a bigger stack for MAXPROCESSES. %i processors is too much\n",cpus); return 1; } /* find out the file name */ #if UNIX strcpy(rasmexename,argv[0]); #else GetModuleFileName(NULL,rasmexename,2044); #endif printf("Stored in rasmexename = %s\n",rasmexename); } else { // latency 2_452 ==> means processor 2 out of 452. int np = 0; ProcessNumber = 0; do { ProcessNumber *= 10; ProcessNumber += (argv[1][np]-'1')+1; // n np++; } while( argv[1][np] >= '0' && argv[1][np] <= '9' ); ProcessNumber--; // 1 less because of ProcessNumber ==> [0..n-1] np++; // skip underscore cpus = 0; do { cpus *= 10; cpus += (argv[1][np]-'1')+1; // n np++; } while( argv[1][np] >= '0' && argv[1][np] <= '9' ); //printf("Slave: ProcessNumber=%i cpus=%i\n",ProcessNumber,cpus); } /* first we setup the random number generator. */ RanrotAInit(); /* initialize shared memory tree; it gets used for communication between the processes */ if( !AllocateTree() ) { printf("Error: ProcessNumber %i could not allocate the tree\n",ProcessNumber); return 1; } if( !ProcessNumber ) ParseBuffer(nbytes); nentries = tree->nentries; /* Now some stuff only the Master has to do */ if( !ProcessNumber ) { /* Master: now let's time the pseudo random generators speed in nanoseconds a call */ ns_rng = TimeRandom(); f_loop = LoopRandom(); printf("Trying to Allocate Buffer\n"); t1 = GetClock(); if( !AllocateHash() ) { printf("Error: Could not allocate buffer!\n"); return 1; } t2 = GetClock(); printf("Took %i.%03i seconds to allocate Hash\n",(t2-t1)/1000,(t2-t1)%1000); ClearHash(); t1 = GetClock(); printf("Took %i.%03i seconds to clear Hash\n",(t1-t2)/1000,(t1-t2)%1000); /* so now hashtable is setup and we know quite some stuff. So it is time to * start all other processes */ InitTree(cpus); printf("Starting Other processes\n"); t1 = GetClock(); if( !StartProcesses(cpus) ) { printf("Error: Could not start processes\n"); DeAllocate(); } } else { /* all Slaves do this */ if( !AllocateHash() ) { printf("Error: slave %i Could not allocate buffer!\n",ProcessNumber); return 1; } } tree->ps[ProcessNumber].status = STATUS_READ; if( !ProcessNumber ) { WaitForStatus(cpus,STATUS_READ); t2 = GetClock(); printf("Took %i milliseconds to start %i additional processes\n",t2-t1,cpus-1); printf("Read latency measurement STARTS NOW using steps of 2 * %i.%03i seconds :\n", (SWITCHTIME/1000),(SWITCHTIME%1000)); } firstguess = 200000; tottimes = 0; for( ;; ) { int timetaken = 0; if( tree->ps[ProcessNumber].status == STATUS_MEASUREREAD ) { /* this really MEASURES the readread */ BITBOARD ntried = 0,avnumber; int totaltime=0; while( totaltime < SWITCHTIME ) { /* go measure around switchtime seconds */ totaltime += DoNreads(firstguess); ntried += firstguess; } /* now put the average number of readreads into the shared memory */ avnumber = (ntried*1000) / (BITBOARD)totaltime; tree->ps[ProcessNumber].readread = avnumber; /* show that it is finished */ tree->ps[ProcessNumber].status = STATUS_MEASUREDREAD; /* now keep doing the same thing until status gets modified */ while( tree->ps[ProcessNumber].status == STATUS_MEASUREDREAD ) { (void)DoNreads(firstguess); if( !ProcessNumber ) { if( CheckAllStatus(cpus,STATUS_MEASUREDREAD) ) { PutStatus(cpus,STATUS_QUIT); break; } } } } else if( tree->ps[ProcessNumber].status == STATUS_READ ) { BITBOARD nextguess; /* now software must try to determine how many reads a seconds are possible for that * process */ //printf("proc=%i trying %s reads\n",ProcessNumber,To64(firstguess)); timetaken = DoNreads(firstguess); /* try to guess such that next test takes 1 second, or if test was too inaccurate * then double the number simply. also prevents divide by zero error ;) */ if( timetaken < 400 ) nextguess = firstguess*2; else nextguess = (firstguess*1000)/(BITBOARD)timetaken; firstguess = nextguess; if( !ProcessNumber ) { tottimes += timetaken; if( tottimes >= SWITCHTIME ) { // 30 seconds to a few minutes PutStatus(cpus,STATUS_MEASUREREAD); //PutStatus(cpus,STATUS_QUIT); tottimes = 0; } } } else if( tree->ps[ProcessNumber].status == STATUS_QUIT ) break; } /* now do the latency tests */ //TestLatency(ns_rng); tree->ps[ProcessNumber].status = STATUS_QUIT; if( !ProcessNumber ) { BITBOARD averagereadread; int i; averagereadread = 0; WaitForStatus(cpus,STATUS_QUIT); for( i = 0; i < cpus ; i++ ) { averagereadread += tree->ps[i].readread; } averagereadread /= (BITBOARD)cpus; printf("Raw Average measured read read time at %i processes = %f ns\n",cpus,ToNano(averagereadread)); printf("Now for the final calculation it gets compensated:\n"); printf(" Average measured read read time at %i processes = %f ns\n",cpus,ToNano(averagereadread)-f_loop); } DeAllocate(); return 0; } /* EOF latency.c */
This page took 0.03 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.