Author: Robert Hyatt
Date: 14:23:37 07/17/03
Go up one level in this thread
On July 17, 2003 at 08:07:07, Vincent Diepeveen wrote: >On July 16, 2003 at 18:25:51, Robert Hyatt wrote: > >>On July 16, 2003 at 16:12:45, Dieter Buerssner wrote: >> >>>On July 16, 2003 at 15:40:07, Gerd Isenberg wrote: >>> >>>>Yes i see, but to do exactly Vincent's sequence, calling RanrotA 100 million >>>>times in a loop, you need a rather huge list ;-) >>> >>>Hi Gerd, for 100 millions, I would need 400 Mb RAM (which I can). I need >>>100e6*sizeof(pointer). But thatis not really the point. My idea (one alternative >>>at least) is to start with a cyclic linked list, that just connects the elements >>>in order at first, and last entry goes back to start. Then use a typical card >>>shuffling routine, to randomize this list. If the mem is over 400 Mb, it would >>>mean, that we don't visit each cell. If it is lower, we will visit the cells >>>serveral timed. I would suspect, as long as memory is much bigger than cache >>>sizes, it makes little difference. But does it make a difference to lmbench type >>>sequential testing (with the same linked list method)? >>> >>>And, after all, we use virtual memory nowadays. Doesn't this include one more >>>indirection (done by hardware). Without knowing much about it, I wouldn't be >>>surprized, that hardware time for those indirections is needed more often with >>>the random access style pattern. >> >>You are talking about the TLB. >> >>The memory mapping hardware needs two memory references to compute a real >>address before it can be accessed. The TLB keeps the most recent N of these >>things around. If you go wild with random accessing, you will _certainly_ >>make memory latency 3x what it should be, because the TLB entries are 100% >>useless. Of course that is not sensible because 90+ percent of the memory >>references in a chess program are _not_ scattered all over memory. > >I happen to have a special mode in diep to generate some statistics. This is the >difference between the stronger programs and the real amateurs. > >As you can see from part of the data as quoted at a big depth around 6% of the >positions were in hashtable (that includes positions where you do not get a >cutoff but depth-i where i > 0 too). So 94% will have that 3x penalty of a slow >access to RAM. Not necessarily. First, the PC can use 4K or 4M pages. 4M pages x 62 (my dual xeon has 62 TLB entries according to lm-bench means that I can randomly access within 250 megs of RAM with _no_ overhead, just 125-150ns raw memory latency, as I have said repeatedly. If you wildly vary the addresses, the MMU has associated virtual-to-real translation overhead. another one, two or even three cycles. (memory cycles). Of course not _all_ machines have this. The Cray, for example, because it doesn't _have_ virtual memory nor any address translation. > >Let's not forget a read to the evaluation table which stores evaluation (always >overwrite and 1 probe) and let's not forget also that pawnhashtable which you >put in crafty always to gigantic proportions. I don't use "gigantic proportions." I can't say what you use however. But it doesn't matter. In the normal search, I do a probe every node, but _none_ in the q-search. In the q-search, I probe the pawn hash table for many nodes (but not most because the pawn structure doesn't change that often) but I don't probe the hash table. So at _worst_, one probe per node. On my xeon, using one cpu, that's a million nodes per second, roughly. At nearly 3ghz, that is potentially 6000 instructions per node, probably a bit less. One hash probe every 6000 instructions is _not_ going to be a performance killer. > >Pawnhashtable in DIEP is very small. It doesn't store that much info in fact. >Only patterns that have to do with pawns and no other knowledge inside the >patterns. I don't store much either. I think a pawn hash entry is 24 bytes, with 8 bytes of that the pawn hash signature. > Here is a part of the statistics, note that the mainline is not when >the statistics get printed. Those are end of ply: > > 05:03:59 175029 175k 0 0 3192432900 (2) 16 (2528,12926) 0.040 Nd4xc6 b7x >c6 b2-b3 e6-e5 f4xe5 d6xe5 Be2-c4 Bc8-g4 Qd1-c1 Qc7-a5 Qc1-e1 Be7-b4 Be3-d2 Qa5- >c7 >depth=16 nps=175135 ttl=5088790298 aborted=930676670 (afh=2807,splits=15271) tim >e=29056.33s >Statistics ply=16 for Successful Hashing > Eval: perc= 14.03 hash=601232768 calls=4282359068 from which m-e=0 > Pawn: perc= 91.40 hash=3364694961 calls=3681126300 >Statistics Qsearch totally 4265810730 nodes (83.82) from which 2029063989 firstp >ly leafs >[snip] >[snip] >tot=0 remainsbest=0 cutsame=0 cutother=0 >(0=9.07% 74696628) (1=64.21% 528475969) (2=16.13% 132814700) (3=7.65% 62972325) >(4=1.68% 13888078) (5=0.89% 7366238) (6=0.18% 1555420) (7=0.10% 897605) >(8=0.02% 173336) (9=0.01% 106509) (10=0.00% 17823) (11=0.00% 12133) >(12=0.00% 1602) (13=0.00% 1065) (14=0.00% 94) (15=0.00% 43) > >total main search depth count = 822979568 (16.61) What is the point of that? IE my pawn hashing is always around 99%, period, unless it hits 100% in endgames. > >>> >>>BTW. To shuffle a deck of N cards, one needs N calls to the PRNG. Often people, >>>who try this first, do it rather wrong. They randomly choose 2 indices, and swap >>>those cards. This is a very inefficient method, and even with many more than N >>>calls to the PRNG, the deck is not shuffled well. >>> >>>I did not start yet. When I have the results, I can think about possible >>>problems with mod, loop unrolling, etc. (I will use loop unrolling for the >>>random linked list test). >>> >>>Cheers, >>>Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.