Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Source code to measure it - there is something wrong

Author: Robert Hyatt
Date: 14:23:37 07/17/03
On July 17, 2003 at 08:07:07, Vincent Diepeveen wrote:

>On July 16, 2003 at 18:25:51, Robert Hyatt wrote:
>
>>On July 16, 2003 at 16:12:45, Dieter Buerssner wrote:
>>
>>>On July 16, 2003 at 15:40:07, Gerd Isenberg wrote:
>>>
>>>>Yes i see, but to do exactly Vincent's sequence, calling RanrotA 100 million
>>>>times in a loop, you need a rather huge list ;-)
>>>
>>>Hi Gerd, for 100 millions, I would need 400 Mb RAM (which I can). I need
>>>100e6*sizeof(pointer). But thatis not really the point. My idea (one alternative
>>>at least) is to start with a cyclic linked list, that just connects the elements
>>>in order at first, and last entry goes back to start. Then use a typical card
>>>shuffling routine, to randomize this list. If the mem is over 400 Mb, it would
>>>mean, that we don't visit each cell. If it is lower, we will visit the cells
>>>serveral timed. I would suspect, as long as memory is much bigger than cache
>>>sizes, it makes little difference. But does it make a difference to lmbench type
>>>sequential testing (with the same linked list method)?
>>>
>>>And, after all, we use virtual memory nowadays. Doesn't this include one more
>>>indirection (done by hardware). Without knowing much about it, I wouldn't be
>>>surprized, that hardware time for those indirections is needed more often with
>>>the random access style pattern.
>>
>>You are talking about the TLB.
>>
>>The memory mapping hardware needs two memory references to compute a real
>>address before it can be accessed.  The TLB keeps the most recent N of these
>>things around.  If you go wild with random accessing, you will _certainly_
>>make memory latency 3x what it should be, because the TLB entries are 100%
>>useless.  Of course that is not sensible because 90+ percent of the memory
>>references in a chess program are _not_ scattered all over memory.
>
>I happen to have a special mode in diep to generate some statistics. This is the
>difference between the stronger programs and the real amateurs.
>
>As you can see from part of the data as quoted at a big depth around 6% of the
>positions were in hashtable (that includes positions where you do not get a
>cutoff but depth-i where i > 0 too). So 94% will have that 3x penalty of a slow
>access to RAM.

Not necessarily.  First, the PC can use 4K or 4M pages.  4M pages x 62 (my
dual xeon has 62 TLB entries according to lm-bench means that I can randomly
access within 250 megs of RAM with _no_ overhead, just 125-150ns raw memory
latency, as I have said repeatedly.

If you wildly vary the addresses, the MMU has associated virtual-to-real
translation overhead.  another one, two or even three cycles.  (memory
cycles).

Of course not _all_ machines have this.  The Cray, for example, because it
doesn't _have_ virtual memory nor any address translation.

>
>Let's not forget a read to the evaluation table which stores evaluation (always
>overwrite and 1 probe) and let's not forget also that pawnhashtable which you
>put in crafty always to gigantic proportions.

I don't use "gigantic proportions."  I can't say what you use however.
But it doesn't matter.  In the normal search, I do a probe every node, but
_none_ in the q-search.  In the q-search, I probe the pawn hash table for
many nodes (but not most because the pawn structure doesn't change that
often) but I don't probe the hash table.  So at _worst_, one probe per node.
On my xeon, using one cpu, that's a million nodes per second, roughly.  At
nearly 3ghz, that is potentially 6000 instructions per node, probably a bit
less.  One hash probe every 6000 instructions is _not_ going to be a
performance killer.



>
>Pawnhashtable in DIEP is very small. It doesn't store that much info in fact.
>Only patterns that have to do with pawns and no other knowledge inside the
>patterns.

I don't store much either.  I think a pawn hash entry is 24 bytes, with 8
bytes of that the pawn hash signature.

> Here is a part of the statistics, note that the mainline is not when
>the statistics get printed. Those are end of ply:
>
>        05:03:59 175029 175k 0 0 3192432900 (2) 16 (2528,12926) 0.040 Nd4xc6 b7x
>c6 b2-b3 e6-e5 f4xe5 d6xe5 Be2-c4 Bc8-g4 Qd1-c1 Qc7-a5 Qc1-e1 Be7-b4 Be3-d2 Qa5-
>c7
>depth=16 nps=175135 ttl=5088790298 aborted=930676670 (afh=2807,splits=15271) tim
>e=29056.33s
>Statistics ply=16 for Successful Hashing
>  Eval:   perc= 14.03 hash=601232768 calls=4282359068 from which m-e=0
>  Pawn:   perc= 91.40 hash=3364694961 calls=3681126300
>Statistics Qsearch totally 4265810730 nodes (83.82) from which 2029063989 firstp
>ly leafs
>[snip]
>[snip]
>tot=0 remainsbest=0 cutsame=0 cutother=0
>(0=9.07% 74696628) (1=64.21% 528475969) (2=16.13% 132814700) (3=7.65% 62972325)
>(4=1.68% 13888078) (5=0.89% 7366238) (6=0.18% 1555420) (7=0.10% 897605)
>(8=0.02% 173336) (9=0.01% 106509) (10=0.00% 17823) (11=0.00% 12133)
>(12=0.00% 1602) (13=0.00% 1065) (14=0.00% 94) (15=0.00% 43)
>
>total main search depth count = 822979568 (16.61)

What is the point of that?

IE my pawn hashing is always around 99%, period, unless it hits 100%
in endgames.


>
>>>
>>>BTW. To shuffle a deck of N cards, one needs N calls to the PRNG. Often people,
>>>who try this first, do it rather wrong. They randomly choose 2 indices, and swap
>>>those cards. This is a very inefficient method, and even with many more than N
>>>calls to the PRNG, the deck is not shuffled well.
>>>
>>>I did not start yet. When I have the results, I can think about possible
>>>problems with mod, loop unrolling, etc. (I will use loop unrolling for the
>>>random linked list test).
>>>
>>>Cheers,
>>>Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.