Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Source code to measure it - there is something wrong

Author: Gerd Isenberg

Date: 12:25:08 07/16/03

Go up one level in this thread


On July 16, 2003 at 14:41:45, Dieter Buerssner wrote:

>Just few comments about the thread.
>
>An interesting test would be, to do lmbench type linked list test with Vincent's
>idea of real random access. I may try it out later. No PRNG calls will be
>needed. The linked list will be initialized "pseudo randomly". In this case, it
>would mean, that it will not be too close to real random, because in one cycle
>every memory adress will be read once. (This could easily happen anyway, with
>not so decent PRNGs).
>
>An perhaps interesting comment from lmbench source:
>
>
>       /*
>        * First create a list of pointers.
>        *
>        * This used to go forwards, we want to go backwards to try and defeat
>        * HP's fetch ahead.
>        *
>        * We really need to do a random pattern once we are doing one hit per
>        * page.
>        */
>
>So, the authors did not seem too confident with the sequential like access? Or
>did I misunderstand.
>
>The PRNG Vincent uses is fine. I will do some tests on it. Lagged Fibonacci type
>generators don't have problems with mod (often rand() uses a linear congruential
>generator, which can have severe problem, especially when used with mod. Anyway,
>for this sort of test, I think even very bad PRNGs would do well. There is no
>way, the hardware can guess the access pattern.
>
>Regards,
>Dieter

Hi Dieter,

what about the hashsize and the mod?

I tried Vincent's test with some loop unrolling - very strange.

int DoNrng(BITBOARD n) {
  BITBOARD i,i1,i2,dummyres,nents;
  int t1,t2;

  nents = nentries; /* hopefully this gets into a register */
  dummyres = globaldummy;

  n >>= 1;
  t1 = GetClock();
  for (i=0; i < n; i += 2) {
    i1 = RanrotA()%nents;
    dummyres ^= i1;
    i2 = RanrotA()%nents;
    dummyres ^= i2;
  }
  t2 = GetClock();
  globaldummy = dummyres;
  return(t2-t1);
}

int DoNreads(BITBOARD n) {
  BITBOARD i=0,dummyres,nents, i1, i2;
  int t1,t2;

  nents = nentries; /* hopefully this gets into a register */
  dummyres = globaldummy;

  n >>= 1;
  t1 = GetClock();
  for (i=0; i < n; i += 2) {
    i1 = RanrotA()%nents;
    dummyres ^= hashtable[i1];
    i2 = RanrotA()%nents;
    dummyres ^= hashtable[i2];
  }
  t2 = GetClock();
  globaldummy = dummyres;
  return(t2-t1);
}

Even more strance with more agressive unrolling ;-)
What happens here, did i made an error with unrolling here?

Cheers,
Gerd



I switched off optimization but got results like this (even with longer times):

C:\Source\latency\Release>latency 300000000 1
Welcome to RASM Latency!
RASML measures the RANDOM AVERAGE SHARED MEMORY LATENCY!

Stored in rasmexename = C:\Source\latency\Release\latency.exe
Trying to allocate 37500000 entries. In total 300000000 bytes
Benchmarking Pseudo Random Number Generator speed, RanRot type 'B'!
Speed depends upon CPU and compile options from RASML,
 therefore we benchmark the RNG
Please wait a few seconds.. ..took 5368 milliseconds to generate  numbers
Speed of RNG = 19076005 numbers a second
So 1 RNG call takes 52.421878 nanoseconds
Benchmarking random RNG test. Please wait..
timetaken=2424
Machine needs 48.480002 ns for RND loop
Trying to Allocate Buffer
Took 0.000 seconds to allocate Hash
Clearing hashtable
Took 0.821 seconds to clear Hash
Starting Other processes
Took 0 milliseconds to start 0 additional processes
Read latency measurement STARTS NOW using steps of 2 * 1.000 seconds :
Raw Average measured read read time at 1 processes = 155.465361 ns
Now for the final calculation it gets compensated:
  Average measured read read time at 1 processes = 106.985360 ns


The assembler listing of DoNreads:

?DoNreads@@YAH_K@Z PROC NEAR				; DoNreads

; 427  : int DoNreads(BITBOARD n) {

	push	ebp
	mov	ebp, esp
	sub	esp, 48					; 00000030H
	push	esi

; 428  :   BITBOARD i,dummyres,nents, i1, i2;
; 429  :   int t1,t2;
; 430  :
; 431  :   nents = nentries; /* hopefully this gets into a register */

	mov	eax, DWORD PTR ?nentries@@3_KA
	mov	DWORD PTR _nents$[ebp], eax
	mov	ecx, DWORD PTR ?nentries@@3_KA+4
	mov	DWORD PTR _nents$[ebp+4], ecx

; 432  :   dummyres = globaldummy;

	mov	edx, DWORD PTR ?globaldummy@@3_KA
	mov	DWORD PTR _dummyres$[ebp], edx
	mov	eax, DWORD PTR ?globaldummy@@3_KA+4
	mov	DWORD PTR _dummyres$[ebp+4], eax

; 433  :
; 434  :   n >>= 1;

	mov	eax, DWORD PTR _n$[ebp]
	mov	edx, DWORD PTR _n$[ebp+4]
	mov	ecx, 1
	call	__aullshr
	mov	DWORD PTR _n$[ebp], eax
	mov	DWORD PTR _n$[ebp+4], edx

; 435  :   t1 = GetClock();

	call	?GetClock@@YAHXZ			; GetClock
	mov	DWORD PTR _t1$[ebp], eax

; 436  :   for (i=0; i < n; i += 2) {

	mov	DWORD PTR _i$[ebp], 0
	mov	DWORD PTR _i$[ebp+4], 0
	jmp	SHORT $L43049
$L43050:
	mov	ecx, DWORD PTR _i$[ebp]
	add	ecx, 2
	mov	edx, DWORD PTR _i$[ebp+4]
	adc	edx, 0
	mov	DWORD PTR _i$[ebp], ecx
	mov	DWORD PTR _i$[ebp+4], edx
$L43049:
	mov	eax, DWORD PTR _i$[ebp+4]
	cmp	eax, DWORD PTR _n$[ebp+4]
	ja	$L43051
	jb	SHORT $L43322
	mov	ecx, DWORD PTR _i$[ebp]
	cmp	ecx, DWORD PTR _n$[ebp]
	jae	$L43051
$L43322:

; 437  :     i1 = RanrotA()%nents;

	call	?RanrotA@@YA_KXZ			; RanrotA
	mov	ecx, DWORD PTR _nents$[ebp+4]
	push	ecx
	mov	ecx, DWORD PTR _nents$[ebp]
	push	ecx
	push	edx
	push	eax
	call	__aullrem
	mov	DWORD PTR _i1$[ebp], eax
	mov	DWORD PTR _i1$[ebp+4], edx

; 438  :     dummyres ^= hashtable[i1];

	push	0
	push	8
	mov	edx, DWORD PTR _i1$[ebp+4]
	push	edx
	mov	eax, DWORD PTR _i1$[ebp]
	push	eax
	call	__allmul
	mov	ecx, DWORD PTR ?hashtable@@3PA_KA	; hashtable
	mov	edx, DWORD PTR _dummyres$[ebp]
	xor	edx, DWORD PTR [ecx+eax]
	mov	esi, DWORD PTR _dummyres$[ebp+4]
	xor	esi, DWORD PTR [ecx+eax+4]
	mov	DWORD PTR _dummyres$[ebp], edx
	mov	DWORD PTR _dummyres$[ebp+4], esi

; 439  :     i2 = RanrotA()%nents;

	call	?RanrotA@@YA_KXZ			; RanrotA
	mov	ecx, DWORD PTR _nents$[ebp+4]
	push	ecx
	mov	ecx, DWORD PTR _nents$[ebp]
	push	ecx
	push	edx
	push	eax
	call	__aullrem
	mov	DWORD PTR _i2$[ebp], eax
	mov	DWORD PTR _i2$[ebp+4], edx

; 440  :     dummyres ^= hashtable[i2];

	push	0
	push	8
	mov	edx, DWORD PTR _i2$[ebp+4]
	push	edx
	mov	eax, DWORD PTR _i2$[ebp]
	push	eax
	call	__allmul
	mov	ecx, DWORD PTR ?hashtable@@3PA_KA	; hashtable
	mov	edx, DWORD PTR _dummyres$[ebp]
	xor	edx, DWORD PTR [ecx+eax]
	mov	esi, DWORD PTR _dummyres$[ebp+4]
	xor	esi, DWORD PTR [ecx+eax+4]
	mov	DWORD PTR _dummyres$[ebp], edx
	mov	DWORD PTR _dummyres$[ebp+4], esi

; 441  :   }

	jmp	$L43050
$L43051:

; 442  :   t2 = GetClock();

	call	?GetClock@@YAHXZ			; GetClock
	mov	DWORD PTR _t2$[ebp], eax

; 443  :   globaldummy = dummyres;

	mov	eax, DWORD PTR _dummyres$[ebp]
	mov	DWORD PTR ?globaldummy@@3_KA, eax
	mov	ecx, DWORD PTR _dummyres$[ebp+4]
	mov	DWORD PTR ?globaldummy@@3_KA+4, ecx

; 444  :   return(t2-t1);

	mov	eax, DWORD PTR _t2$[ebp]
	sub	eax, DWORD PTR _t1$[ebp]

; 445  : }

	pop	esi
	mov	esp, ebp
	pop	ebp
	ret	0
?DoNreads@@YAH_K@Z ENDP					; DoNreads




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.