Author: Ricardo Gibert
Date: 01:25:44 04/16/03
Go up one level in this thread
On April 16, 2003 at 00:05:02, Robert Hyatt wrote: >On April 15, 2003 at 08:56:54, Vincent Diepeveen wrote: > >>On April 14, 2003 at 17:43:12, Robert Hyatt wrote: >> >>>On April 14, 2003 at 17:15:41, Vincent Diepeveen wrote: >>> >>>>On April 13, 2003 at 22:39:39, Robert Hyatt wrote: >>>> >>>>>On April 13, 2003 at 11:49:28, Vincent Diepeveen wrote: >>>>> >>>>>>On April 13, 2003 at 11:27:53, Robert Hyatt wrote: >>>>>> >>>>>>I said initially. It drops back to 10 splits a second in DIEP after a while. >>>>>>Search depth matters. >>>>>> >>>>>>Let's compare 2 things. >>>>>> >>>>>> time=45.98 cpu=464% mat=0 n=37870294 fh=88% nps=823k >>>>>> ext-> chk=638414 cap=249442 pp=9588 1rep=32966 mate=223 >>>>>> predicted=0 nodes=37870294 evals=14565859 >>>>>> endgame tablebase-> probes done=0 successful=0 >>>>>> hashing-> trans/ref=28% pawn=93% used=28% >>>>>> SMP-> split=431 stop=57 data=6/64 cpu=3:33 elap=45.98 >>>>>> >>>>>>MT 2 crafty 18.10 which i have here. 431 splits at 45 seconds. I guess you must >>>>>>limit in crafty the number of splits a lot as splitting is expensive in crafty >>>>>>when compared to the costs of a single node. >>>>> >>>>>I'm not sure how expensive it is compared to a node. I'll run a test where >>>>>I do the split overhead at every node to compare, however... >>>>> >>>>> >>>>> >>>>>I don't limit them at all. The only limit is the YBW algorithm. But I split >>>>>at the root also, which reduces them signficantly... >>>> >>>>I can split at the root nowadays, but i have turned it off for diep. it gives >>>>too poor speedup for me. The interesting thing which searching SMP can give is >>>>transpositions at a big depth which possibly are overwritten by a sequential >>>>search. i don't want to miss them. >>> >>>Maybe you don't split at the root correctly. I limit this with some intelligent >>>guesswork, so that if it appears that I might change my mind this iteration, >>>then >>>I don't split at the root until I have searched all moves that I think might >>>replace >>>the best move... >> >>i don't have a bug there. With just 2 or 4 cpu's you can split at so many points >>that i chose to not split at the root. It gives a bad speedup. Not near 2.0 to >>be precise. > >Sorry, but that is wrong. The root is the _perfect_ place to split to avoid >search overhead, because we _know_ that every root move must be searched, while >we can't say that with 100% accuracy about any _other_ position in the tree... > >See Knuth/Moore, "An analysis of alpha/beta pruning" 100% accuracy? What about a position that is a mate in one and the first move you look at gives mate. Why would you want to continue searching any more moves? > > >> >>However if splitting is as expensive as it is in crafty i can very well >>understand you do it. > >It's not done because it is too expensive to not split at the root. It is >done (by me) because it is more _efficient_ to do so... The math is simple. > > >> >>> >>>> >>>>As i showed half a year ago the chance is a bigger with SMP 2 threads/processes >>>>that the chance that a transposition cutoff occurs with a depthleft a slighly >>>>bit bigger on average than when doing deep sequential searches (of course >>>>hashtable needs to be able to get filled quite some, but under practical >>>>tournament conditions this is the case in most programs). >>>> >>>>I will however again experiment with splitting in root with a 128 processor run, >>>>when this works very well. Not to reduce number of splits so much but to get the >>>>cpu's sooner non-idling (where idling as we know is not really idling at all). >>>> >>>>128 cpu runs of 10 minutes are not too expensive. 1280 minutes / 60 = 21 cpu >>>>hour. Of course the only hard thing is when you are unlucky with a run (each run >>>>can be different and perhaps one time you have a very poor run which gives a bad >>>>speedup, where reality is it would give a better speedup). >>>> >>>>Anyway splitting in root doesn't work for me with 2-16 cpu's. >>> >>>As I said, you have to think about it. There are ways to make it work, and it >>>lowers overhead drastically when it is done correctly. (search overhead goes >>>down). >> >>i have it to work, it just doesn't give a very good speedup at all, that's why i >>do not do it. >> >>Right now there is a simple extra 'if' condition >> >> if( .. && realply > 1 .. >> >>If i kick it out then it splits at the root. >> > >That's not good enough, as I said.. > >There is _more_ to consider when making the decision... > > >> >> >>>> >>>>Best regards, >>>>Vincent >>>> >>>>> >>>>>> >>>>>>Let's ignore the cpu=464% i do not understand why it says that. I have it at >>>>>>mt=2. probably small i/o bug. >>>>>> >>>>>>Now let's diep search for around this time: >>>>>> >>>>>>Took 0.12 seconds to start all 1 other processes out of 2 >>>>>>00:00 21 0k 0 0 21 (2) 2 (0,0) -0.022 Ng1-f3 d7-d5 >>>>>>++ d2-d4 procnr=0 terug=1 org=[-22;-21] newwindow=[-22;520000] >>>>>>00:00 71 0k 0 0 71 (2) 2 (0,0) 0.001 d2-d4 d7-d5 >>>>>>00:00 175 0k 0 0 175 (2) 3 (0,2) 0.157 d2-d4 d7-d5 Ng1-f3 >>>>>>00:00 443 0k 0 0 443 (2) 4 (0,5) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 >>>>>>00:00 150800 151k 0 0 1508 (2) 5 (0,19) 0.190 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3 >>>>>>00:00 318900 319k 0 0 3189 (2) 6 (0,27) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3 N >>>>>>b8-c6 >>>>>>00:00 149744 150k 0 0 13477 (2) 7 (3,68) 0.179 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Bc1-f4 >>>>>>Nf6-h5 Bf4-g5 >>>>>>00:00 136110 136k 0 0 27222 (2) 8 (6,147) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Bc1-f4 >>>>>> Nf6-h5 Bf4-g5 Nb8-c6 >>>>>>00:01 127109 127k 0 0 205917 (2) 9 (45,502) 0.105 d2-d4 Ng8-f6 Nb1-c3 Nb8-c6 Bc1 >>>>>>-f4 d7-d6 Ng1-f3 Bc8-f5 e2-e3 >>>>>>00:04 127013 127k 0 0 572829 (2) 10 (76,666) 0.001 d2-d4 Ng8-f6 Nb1-c3 d7-d5 Bc1 >>>>>>-f4 Bc8-f5 Ng1-f3 Nb8-c6 Nf3-e5 Nf6-e4 >>>>>>00:17 152655 153k 0 0 2648566 (2) 11 (330,1980) 0.108 d2-d4 d7-d5 Ng1-f3 Nb8-c6 >>>>>>Nb1-c3 Bc8-f5 Nf3-h4 Bf5-c8 Bc1-g5 Ng8-f6 e2-e3 >>>>>>00:38 154041 154k 0 0 5889009 (2) 12 (743,4189) 0.008 d2-d4 d7-d5 Bc1-f4 Bc8-f5 >>>>>>Ng1-f3 Ng8-f6 Nb1-c3 Nb8-c6 Nc3-b5 Ra8-c8 Nf3-e5 Nc6xe5 d4xe5 >>>>>> >>>>>>Of course if i use same conditions like crafty when to split then it will look >>>>>>different with regards to the number of splits performed. >>>>>> >>>>>>Splitting in diep is very cheap. I already split >= 2 ply left searches and i >>>>>>split quickly in current versions. >>>>> >>>>>I split everywhere. It is possible to limit this and I think the current >>>>>version avoids splitting at the last 2-3 plies of the tree. I haven't tested >>>>>this on my dual to see if the current value is correct, however... >>>>> >>>>> >>>>>> The reason is that you get 500 cpu's quicker >>>>>>busy and find bugs sooner. No doubt in future i will again optimize it to a >>>>>>state where it will optimize search depth more at x86. If that's with many >>>>>>splits a second at 2-4 processes, then i'll go for that. If it is with less >>>>>>splits a second i'll go for that. >>>>>> >>>>>>Note that the 4189 number at 12 ply is not the number of splits only, it is the >>>>>>total number of searches. So about 11*20 + 1 = 220 + 1 = 221 are from searching >>>>>>the root. >>>>>> >>>>>>>On April 13, 2003 at 08:32:37, Vincent Diepeveen wrote: >>>>>>> >>>>>>>>On April 13, 2003 at 08:21:42, Vincent Diepeveen wrote: >>>>>>>> >>>>>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote: >>>>>>>>> >>>>>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote: >>>>>>>>>> >>>>>>>>>>>It _is_ pinned on SMT. The two logical processors are producing wildly >>>>>>>>>>>imbalanced results when using threads, vs using two separate processes. It >>>>>>>>>>>would appear to be cache-related... >>>>>>>>>> >>>>>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you >>>>>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT? >>>>>>>>>> >>>>>>>>>>How much of the time are your threads idle, out of curiosity? If one thread is >>>>>>>>>>idle much more than the other, then of course that is going to skew your NPS. >>>>>>>>>> >>>>>>>>>>-Tom >>>>>>>>> >>>>>>>>>Of course both Crafty and DIEP are using YBW. I didn't checkout what bob does >>>>>>>>>here, but in past in DIEP i used to always let process 0 let the search start. >>>>>>>>>Nowadays that is not the case. The i/o thread picks the first process it can >>>>>>>>>get. All search processes are completely identical. This process then is >>>>>>>>>starting the search. That means the other CPUs idle when this process starts the >>>>>>>>>search. >>>>>>>> >>>>>>>>also read that 'idle' not in litterary sense. Letting them REALLY idle with >>>>>>>>sleep() or WaitForSingleObject, is at a REAL smp system (like dual K7) just too >>>>>>>>expensive. Latency to wake up processors is at sick high levels. 15 ms just like >>>>>>>>that. Imagine that because of the YBW search, you have to split initially like >>>>>>>>50-100 times a second. 15ms is death sentence. So 'idle' cpu's are spinning >>>>>>>>around until at a shared memory variable some flag is set. I let them do some >>>>>>>>arithmetic function for a 100 times while 'idling'. >>>>>>> >>>>>>>If you do this right you won't split _that_ often. >>>>>>> >>>>>>> time=35.97 cpu=381% mat=-1 n=80006982 fh=92% nps=2224k >>>>>>> ext-> chk=1487513 cap=353299 pp=32860 1rep=79236 mate=15135 >>>>>>> predicted=3 nodes=80006982 evals=19493470 >>>>>>> endgame tablebase-> probes done=0 successful=0 >>>>>>> SMP-> split=1840 stop=163 data=15/64 cpu=2:17 elap=35.97 >>>>>>> time used: 29.81 >>>>>>> >>>>>>> >>>>>>>In the above from a game on ICC, in 35 seconds, I did 1800 splits total. The >>>>>>>deeper the search the better this becomes... >>>>>>> >>>>>>> time=2:33 cpu=396% mat=0 n=282753699 fh=91% nps=1840k >>>>>>> ext-> chk=3046093 cap=1083298 pp=16735 1rep=192964 mate=3400 >>>>>>> predicted=8 nodes=282753699 evals=114936261 >>>>>>> endgame tablebase-> probes done=0 successful=0 >>>>>>> SMP-> split=2683 stop=424 data=15/64 cpu=10:09 elap=2:33 >>>>>>> time used: 8.29 >>>>>>> >>>>>>> time=4:03 cpu=396% mat=0 n=466004128 fh=90% nps=1911k >>>>>>> ext-> chk=3120074 cap=1773259 pp=60704 1rep=227466 mate=5595 >>>>>>> predicted=9 nodes=466004128 evals=160300467 >>>>>>> endgame tablebase-> probes done=0 successful=0 >>>>>>> SMP-> split=5811 stop=950 data=18/64 cpu=16:06 elap=4:03 >>>>>>> time used: 2:43 >>>>>>> >>>>>>> time=3:47 cpu=396% mat=0 n=421757405 fh=92% nps=1855k >>>>>>> ext-> chk=3436512 cap=1222511 pp=75583 1rep=186606 mate=3165 >>>>>>> predicted=12 nodes=421757405 evals=149496490 >>>>>>> endgame tablebase-> probes done=0 successful=0 >>>>>>> SMP-> split=3524 stop=337 data=17/64 cpu=15:01 elap=3:47 >>>>>>> >>>>>>>> >>>>>>>>>In crafty that's also the case, but i do not know whether Bob always picks a >>>>>>>>>certain thread as first. If so then that might explain quite something. >>>>>>>>> >>>>>>>>>Measuring idle time with SMT is very hard to do objective, but of course you can >>>>>>>>>relatively check it out. Basically the problem is you do not know what the >>>>>>>>>maximum % is that i can get out of SMT, because it is dependant upon the other >>>>>>>>>process too.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.