Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: supercomputers and their future for gameplaying

Author: Bob Durrett

Date: 08:46:56 12/23/03

Go up one level in this thread


On December 23, 2003 at 00:36:10, Vincent Diepeveen wrote:

>On December 22, 2003 at 23:56:16, Robert Hyatt wrote:
>
>>On December 22, 2003 at 22:42:17, Vincent Diepeveen wrote:
>>
>>>On December 22, 2003 at 21:57:07, Vincent Diepeveen wrote:
>>>
>>>Because my report will take a lot of time and will be very extensive it might be
>>>interesting to share 1 thing which in fact amazed me a lot by the clearness of
>>>the results.
>>>
>>>That is that projects like chessbrain are completely doomed to work in realtime.
>>>*completely* doomed.
>>>
>>>I will show some data here.
>>>The question i asked myself was next: it is trivial that cpu's get faster and
>>>faster each year. Yet latency from processor to processor doesn't keep up. This
>>>is a thing that's trivially shown by several researchers already. Under this
>>>assumption it is trivial then that the hashtable will not be capable of getting
>>>shared. So one must use a local hashtable last few plies. A very clear case of
>>>when this has to happen is when one searches in a distributed project (like
>>>chessbrain).
>>>
>>>In diep there is a parameter what depth to store in global hashtable (and
>>>retrieve) and what to store local. The tournament version used did store all
>>>internal nodes into the hashtable, but the quiescencesearch was stored onto
>>>First the normal output at the position:
>>>
>>>Please remember i took a position which very soon showed itself as a worst case
>>>position and the hashtables were cleaned when starting to search at it. That's
>>>why i used it.
>>>
>>>First the normal diep version:
>>>        00:00      0 0 0 (460) 1 (0,0) 2.119 b4xc5
>>>        00:00      0 0 0 (460) 2 (0,0) 2.119 b4xc5 Qc7xc5
>>>        00:02      0 0 0 (460) 3 (0,0) 2.237 b4xc5 Ra8-e8 Nc3-e4 Bd6xc5
>>>        00:02      0 0 0 (460) 4 (0,0) 2.237 b4xc5 Ra8-e8 Nc3-e4 Bd6xc5
>>>        00:40      0 0 0 (460) 5 (0,0) 1.883 b4xc5 Rf8-e8 Nc3-e2 Qc7-a5 c2-c3
>>>        00:45      0 0 0 (460) 6 (0,0) 1.846 b4xc5 Ra8-e8 Nc3-e4 Ne7-f5 Nf3-g5
>>>Bd6xc5
>>>        00:53      0 0 0 (460) 7 (0,0) 2.406 b4xc5 Rf8-e8 Nc3-e4 Ne7-f5 Nf3-g5
>>>Qc7xc5 c2-c3
>>>        00:56      0 0 0 (460) 8 (0,0) 1.739 b4xc5 Rf8-e8 Nc3-e4 Ne7-f5 Nf3-g5
>>>Qc7xc5 Ra1-b1 Nf5-d4
>>>        01:07      0 0 0 (460) 9 (0,0) 1.970 b4xc5 Rf8-e8 Nc3-e4 Ne7-f5 Nf3-g5
>>>Qc7xc5 Qd1-d2 Nf5-d4 c2-c3
>>>        01:34 563556 53024985 0 (460) 10 (15969,151439) 1.204 b4xc5 Rf8-e8
>>>Nf3-g5 Ne7-f5 Nc3-e4 Bd6xc5 Qd1-g4 Nf5-d6 Ng5-e6 Re8xe6 d5xe6
>>>depth=10 nps=607055 ttl=72846666 egtb=0 aborted=48692678
>>>(afh=25543,splits=261134) time=120.00s
>>>
>>>        03:08 758373 142589422 0 (460) 11 (47875,499297) 1.095 b4xc5 Rf8-e8
>>>Nc3-e2 Qc7xc5 c2-c4 Ba6xc4 Ba2xc4 Qc5xc4 Qd1-d4 Qc4xd5 O-O
>>>depth=11 nps=813654 ttl=182877066 egtb=0 aborted=112295422
>>>(afh=67685,splits=693612) time=224.76s
>>>
>>>        03:58 908254 216881978 0 (460) 12 (78112,841881) 1.050 b4xc5 Rf8-e8
>>>Nc3-e2 Qc7xc5 c2-c4 Ba6xc4 Ba2xc4 Qc5xc4 Qd1-d4 Qc4-c2 Qd4-c3 Qc2-e4
>>>depth=12 nps=957532 ttl=386182356 egtb=0 aborted=241106908
>>>(afh=127633,splits=1400567) time=403.31s
>>>
>>>        09:36 1298189 748172751 0 (460) 13 (245417,2398027) 0.444 b4xc5 Rf8-e8
>>>Nc3-e2 Qc7xc5 c2-c4 Ne7xd5 Qd1-d4 Qc5xd4 Nf3xd4 Nd5-c3 Ke1-f2 Bd6-c5 Kf2-f3
>>>depth=13 nps=3168138 ttl=3095778486 egtb=0 aborted=1759560355
>>>(afh=1258955,splits=8177463) time=977.16s
>>>
>>>        19:20 3364631 3906067861 0 (460) 14 (1603537,10670458) 0.117 b4xc5
>>>Rf8-e8 Nc3-e2 Qc7xc5 c2-c4 Ne7xd5 Qd1-d4 Qc5xd4 Nf3xd4 Nd5-c3 a3-a4 f4-f3
>>>        50:54 5317603 16243842977 0 (460) 14 (5403791,33550882) 0.212 b4-b5
>>>Ba6-b7 Qd1-d2 Ne7-f5 O-O Nf5-e3 Rf1-f2 Rf8-e8 Qd2-d4 Ra8-c8 Bc1xe3 f4xe3 Rf2-e2
>>>Nc5-d3
>>>depth=14 nps=5015287 ttl=17096414590 egtb=0 aborted=9662055619
>>>(afh=5723154,splits=36131262) time=3408.86s
>>>
>>>        01:00:31 5064710 18392394781 0 (460) 15 (6077444,39451488) 0.264 b4-b5
>>>Ba6-b7 Qd1-d2 Ne7-f5 O-O Nf5-e3 Rf1-f2 f7-f5 Qd2-d4 Ra8-e8 Ba2-c4 Nc5-e4 Bc1xe3
>>>f4xe3 a3-a4
>>>depth=15 nps=4294115 ttl=20628201297 egtb=0 aborted=11380590805
>>>(afh=6773788,splits=45841476) time=4803.83s
>>>
>>>        01:31:30 4466087 24521993032 0 (460) 16 (7723311,52786168) 0.160 b4-b5
>>>Ba6-b7 Qd1-d2 Ne7-f5 O-O Nf5-e3 Rf1-f2 f7-f5 Nf3-g5 Qc7-b6 Qd2-d4 h7-h6 Ng5-e6
>>>Nc5xe6 Qd4xb6 a7xb6
>>>        01:37:35 4518356 26456741081 0 (460) 16 (8419398,57816187) 0.165 b4xc5
>>>Rf8-e8 Nc3-e2 Qc7xc5 c2-c4 Ne7xd5 Qd1-d4 Qc5xd4 Nf3xd4 Nd5-c3 Ke1-f1 Nc3xa2
>>>Ra1xa2 Ba6xc4 Nd4-f5 Bc4xa2
>>>depth=16 nps=4454773 ttl=32181820283 egtb=0 aborted=16950269941
>>>(afh=10504865,splits=74699573) time=7224.12s
>>>
>>>        02:22:43 5110928 43766052604 0 (460) 17 (13940238,91131639) 0.609 b4xc5
>>>Rf8-e8 Nc3-e2 Bd6xc5 d5-d6 Bc5xd6 O-O Ne7-f5 Ne2-d4 Qc7-c5 c2-c4 Ra8-d8 Kg1-h1
>>>Bd6-e5 Nf3xe5 Qc5xe5 Qd1-g4
>>>depth=17 nps=4860131 ttl=49287275195 egtb=0 aborted=25516239384
>>>(afh=15928214,splits=109235230) time=10141.14s
>>>
>>>        03:22:12 5265541 63882491771 0 (460) 18 (19968586,133598028) 0.466 b4xc5
>>>Rf8-e8 Nc3-e2 Qc7xc5 c2-c4 Ne7xd5 Qd1-d4 Qc5xd4 Nf3xd4 Nd5-c3 Ke1-f1 Nc3xa2
>>>Ra1xa2 f4-f3 Nd4xf3 Ba6xc4 Ra2-c2 Bc4-d3
>>>depth=18 nps=5272397 ttl=83276304020 egtb=0 aborted=42072953528
>>>(afh=26561999,splits=181805307) time=15794.77s
>>>
>>>        05:47:57 5993449 125127338136 0 (460) 19 (37271002,235161437) 0.293
>>>b4xc5 Rf8-e8 Nc3-e2 Bd6xc5 d5-d6 Bc5xd6 O-O Ne7-f5 Ne2-d4 Nf5xd4 Nf3xd4 Ba6xf1
>>>Kg1xf1 Ra8-d8 Bc1-b2 Bd6xa3 Ba2xf7 Kg8-h8 Ra1xa3
>>>depth=19 nps=6361727 ttl=200869193368 egtb=0 aborted=98884061155
>>>(afh=62657776,splits=376446069) time=31574.63s
>>>
>>>        11:05:57 6941105 277350209487 0 (460) 20 (81637568,463819222) 0.228
>>>b4xc5 Rf8-e8 Nc3-e2 Bd6xc5 d5-d6 Bc5xd6 O-O Ne7-f5 Ne2-d4 Nf5xd4 Nf3xd4 Ba6xf1
>>>Kg1xf1 Ra8-d8 Ra1-b1 Bd6-c5 c2-c3 Bc5-d6 Qd1-f3 Qc7-e7
>>>
>>>In the above output we see that 20 plies get reached after 11 hours. This is not
>>>so important though. 10 plies take 94 seconds to reach. the first 9 ply the node
>>>counts and such are not globally collected. They kept stored local. They do get
>>>counted at 10 ply though. So the numbers at 10 ply and further are correct.
>>>Usually first 9 ply get from hashtable or go so fast that there is no need to
>>>bug 460 processors with the data collection which takes several milliseconds a
>>>ply.
>>>
>>>Now we change the parameter. Last 10 ply get stored only in local hashtable
>>>(including qsearch of course) and the depths above 10 ply get stored in global
>>>hashtable.
>>>
>>>Trivially we just count hashtable. Splitting and such still gets done in the
>>>same manner. Diep doesn't use hashtable properties to split with so many
>>>hashtables (only at small number of cpu's it does), so in reality the results we
>>>get back are a lot worse because of the extra time it takes to split the search
>>>tree to the different processors. Yet even then the next result completely
>>>shocked me too:
>>>
>>>        00:00      0 0 0 (460) 1 (0,0) 2.119 b4xc5 Qc7xc5
>>>        00:00      0 0 0 (460) 2 (0,0) 2.119 b4xc5 Qc7xc5
>>>        00:02      0 0 0 (460) 3 (0,0) 2.237 b4xc5 Ra8-e8 Nc3-e4 Bd6xc5
>>>        00:02      0 0 0 (460) 4 (0,0) 2.237 b4xc5 Ra8-e8
>>>        00:14      0 0 0 (460) 5 (0,0) 1.883 b4xc5 Rf8-e8 Nc3-e2
>>>        00:27      0 0 0 (460) 6 (0,0) 1.846 b4xc5 Ra8-e8
>>>        00:36      0 0 0 (460) 7 (0,0) 2.406 b4xc5 Ne7-f5
>>>        01:04      0 0 0 (460) 8 (0,0) 1.824 b4xc5 Bd6xc5
>>>        03:25      0 0 0 (460) 9 (0,0) 1.970 b4xc5 Rf8-e8 Nc3-e4
>>>        08:38 1222567 633607672 0 (460) 10 (148568,1326106) 0.632 b4xc5 Ra8-e8
>>>Nc3-e4
>>>depth=10 nps=1248100 ttl=725720560 egtb=0 aborted=348576875
>>>(afh=194021,splits=1732467) time=581.46s
>>>
>>>        58:45 1540704 5431214025 0 (460) 11 (889614,7134097) 1.110 b4xc5 Qc7xc5
>>>depth=11 nps=1552287 ttl=5519793255 egtb=0 aborted=2338418994
>>>(afh=940275,splits=7579001) time=3555.91s
>>>
>>>        01:14:21 1567045 6991421375 0 (460) 12 (1415222,10794611) 1.065 b4xc5
>>>depth=12 nps=1592326 ttl=7402982085 egtb=0 aborted=3454686069
>>>(afh=1595556,splits=12242496) time=4649.16s
>>>
>>>        02:51:20 1859415 19115107780 0 (460) 13 (2874460,21811050) 0.444 b4xc5
>>>depth=13 nps=2204845 ttl=24139966284 egtb=0 aborted=10848686727
>>>(afh=4834769,splits=32419200) time=10948.60s
>>>
>>>Without global hashtable you're dead simply! A difference of 5 ply in search
>>>depth. When processors get faster and latencies do not keep up with it, you're
>>>dead simply!
>>>
>>>Even during the tournament where a lot of interesting things happened, when i
>>>saw this result i was completely shocked. I realized that all distributed
>>>project plans i had i had to abandon at once. In fact i *did* have the plan to
>>>make a distributed diep version.
>>>
>>>But the result is just *too* clear. It will never play a realtime game!
>>>5 ply, with all respect, that's even with a branching factor of 3.0 like
>>>
>>>3^5 = 243. Starting at a loss of a factor 243 you're history simply. Trivially
>>>we are talking about an exponential effect here.
>>>
>>>Now this triggered the next experiment i did. Trivially when using more modern
>>>processors than are in this supercomputer thing, you already have a similar
>>>problem like above, but perhaps you can get away by not storing into hashtable
>>>last 2 ply into hashtable.
>>>
>>>Note that the latency this supercomputer has (5.8 microseconds on average to
>>>just get 8 bytes from another processor), it is already dubious to store
>>>everything into hashtables. So let's not store depthleft <= 2 into hashtables
>>>and then search with 460 processors:
>>>
>>>        00:00      0 0 0 (460) 1 (0,0) 2.119 b4xc5 Qc7xc5
>>>        00:00      0 0 0 (460) 2 (0,0) 2.119 b4xc5 Qc7xc5
>>>        00:02      0 0 0 (460) 3 (0,0) 2.237 b4xc5 Ra8-e8 Nc3-e4 Bd6xc5
>>>        00:02      0 0 0 (460) 4 (0,0) 2.237 b4xc5
>>>        00:20      0 0 0 (460) 5 (0,0) 1.883 b4xc5
>>>        00:52      0 0 0 (460) 6 (0,0) 1.846 b4xc5
>>>        01:16      0 0 0 (460) 7 (0,0) 2.406 b4xc5
>>>        01:27      0 0 0 (460) 8 (0,0) 1.824 b4xc5
>>>        05:17      0 0 0 (460) 9 (0,0) 1.970 b4xc5
>>>        05:44 244602 84307066 0 (460) 10 (17774,187986) 1.204 b4xc5 Qc7xc5
>>>depth=10 nps=295935 ttl=107235208 egtb=0 aborted=60058340
>>>(afh=28985,splits=303215) time=362.36s
>>>
>>>        06:51 382970 157779897 0 (460) 11 (48121,494174) 1.095 b4xc5
>>>depth=11 nps=442290 ttl=193360776 egtb=0 aborted=114036575
>>>(afh=69008,splits=680388) time=437.18s
>>>
>>>        07:35 551816 251313751 0 (460) 12 (95107,914561) 1.050 b4xc5 Qc7xc5
>>>depth=12 nps=749278 ttl=452773756 egtb=0 aborted=287457319
>>>(afh=169753,splits=1603527) time=604.28s
>>>
>>>        11:33 1344027 932002639 0 (460) 13 (351281,2939548) 0.444 b4xc5 Qc7xc5
>>>depth=13 nps=3324907 ttl=3753055869 egtb=0 aborted=2206079184
>>>(afh=1475329,splits=8898360) time=1128.77s
>>>
>>>        21:06 3722247 4714375656 0 (460) 14 (1891633,11348602) 0.117 b4xc5
>>>        43:53 5137463 13527660552 0 (460) 14 (5270814,30097544) 0.182 b4-b5
>>>depth=14 nps=4905888 ttl=14334219942 egtb=0 aborted=8253355779
>>>(afh=5601330,splits=32690541) time=2921.84s
>>>
>>>        57:16 5159575 17732532620 0 (460) 15 (6859755,40484655) 0.351 b4-b5
>>>depth=15 nps=4679243 ttl=19836809206 egtb=0 aborted=11352746534
>>>(afh=7505339,splits=45855978) time=4239.32s
>>>
>>>        01:22:45 4792391 23797436449 0 (460) 16 (8718381,52815992) 0.160 b4-b5
>>>        01:29:42 5004482 26937627088 0 (460) 16 (9865863,59231902) 0.165 b4xc5
>>>depth=16 nps=5022682 ttl=33629317406 egtb=0 aborted=18947935034
>>>(afh=12566091,splits=78163568) time=6695.49s
>>>
>>>        02:31:09 5371057 48713667924 0 (460) 17 (17430262,102209832) 0.615 b4xc5
>>>depth=17 nps=5133183 ttl=54610551153 egtb=0 aborted=30461988599
>>>(afh=19878164,splits=120558181) time=10638.73s
>>>
>>>        03:38:29 5527223 72457973813 0 (460) 18 (25186378,148399656) 0.466 b4xc5
>>>depth=18 nps=5556667 ttl=92296420576 egtb=0 aborted=50576134199
>>>(afh=33264389,splits=201240456) time=16610.03s
>>>
>>>        06:20:37 6535370 149251520232 0 (460) 19 (47705397,264107777) 0.293
>>>b4xc5
>>>depth=19 nps=6963743 ttl=247859477324 egtb=0 aborted=131087487407
>>>(afh=83002586,splits=441367183) time=35592.85s
>>>
>>>        12:16:54 7458129 329754401671 0 (460) 20 (104829805,537922298) 0.228
>>>b4xc5
>>>
>>>In the long run it is amazingly scaling just as well, if not better than the
>>>original run. However, the problem is very easy to see. Just look at the time of
>>>10 ply. From 1:34 it goes to 5:44.
>>>
>>>The problem is therefore very trivial. Getting the cpu's to work quickly is a
>>>major problem when you do not store last few plies into global hashtable.
>>>
>>>The search tree is simply getting more unpredictable and more unbalanced, which
>>>is a real major problem. So searching at big clusters with fast opteron cpu's
>>>for example, it will be real hard.
>>>
>>>I've been working for years in the parallel field and already achieved more than
>>>anyone before in parallel search just looking to how diep scales at this machine
>>>when compared to the dinosaurs before me.
>>>
>>
>>Perhaps the most arrogant statement I have ever heard, or else a close second
>>to the famous "Diep has the most sophisticated evaluation of any chess program
>>ever written...  making the program unbeatable at very long time controls."
>>
>>Please, Vincent...
>
>Bob look who's talking. You committed fraud in official journal of icga.
>inventing your own numbers.
>
>from the other programs searching at so many cpu's there are no known scaling
>numbers. diep scales 40-60%.
>
>>There have been _plenty_ of good people before you.  There are _plenty_ of
>>good people working on parallel search today.  Your results are not "better
>>than anyone's"...
>
>both you know and i know that there have been very little persons parallellizing
>search at such huge systems.
>
>I can list them at a few fingers in fact.
>
>And none of them came close to diep's scaling when speaking of similar machines
>to origin3800.
>
>Some invented new types of math with numbers to look good, like the zugzwang
>team.
>
>You never showed any public results > 16 cpu's so you have not even a remote
>idea what you talk about.
>
>I get 512 cpu's and within a year it scales > 50%.
>
>Now you: you get yourself a cluster of 32 cpu's and after 5 years ????
>
>>
>>
>>>But when the trees get even more unbalanced than they nowadays already are
>>>because of nullmove and a good working global hashtable, then problems really
>>>get massive for future game tree researchers at massive parallel systems.
>>>
>>>Not so much the speedup after an hour or 10 is the problem, but basically
>>>getting a realtime scaling will be the real problem.
>>>
>>>To me it looks like a very difficult task. The problem lies in the first so many
>>>plies and solutions are not trivial. solutions will be like starting first cpu's
>>>at nearby cpu's. Then cpu's that are at 1 router distance and like that.
>>
>>That is what NUMA programming is all about.  I thought you already knew
>>_everything_ there was to know about NUMA???
>>
>>
>>>
>>>Making a game tree searching program like that is *really* complicated.
>>>
>>>Of course i'm speaking about 100+ cpu's here. Up to a cpu or 64 i see so far no
>>>problems.
>>>
>>>So from my viewpoint the chessbrain project is doomed to play games where they
>>>get a better speedup than a quad xeon/opteron will deliver.
>>>
>>>In fact the results have shocked me that much that i have abandonned all project
>>>plans to convert diep to MPI too. As it seems now, diep will remain shared
>>>memory forever. Even though the conversion to MPI is pretty trivial from the
>>>current datastructure, i simply see no point in doing it!
>>
>>Fortunately, others are not so short-sighted.  You have the impression that
>>"If I can't solve it in a few weeks or months, it can't be solved."  That
>>is _rarely_ true.  Parallel search has been under development since the
>>1970's.  You had the benefit of being able to ask others "How should I
>>do this?"  They (myself included) had many answers for you, that took us
>>_years_ to fully understand as we developed them.  That will continue for a
>>long while.  I don't consider parallel/distributed search to be "impossible".
>>Just "hard".  Which means it takes a lot longer than "easy".  But "lot longer"
>>is not equal to "never".
>>
>>Fortunately some of us have more tenacity, and things will happen.
>>
>>I told you when you started that this was not easy.  You were quick to
>>put down the DB speedup numbers.  You wanted to eliminate the Cray Blitz
>>numbers because you didn't like them, not understanding that the Cray does
>>_not_ have your "NUMA latency problem" whatsoever.  And _finally_ you figure
>>out that it is hard to do.  Something a dozen people around the world (including
>>me) could (or did) tell you when you were starting.
>>
>>If you think you are the best there is at parallel programming, you are
>>_sadly_ mistaken, sorry.
>>
>>
>>
>>
>>>
>>>In a few years when cpu's are a lot faster than they are now, getting to work
>>>fast cpu's with x microsecond latencies is asking for trouble!
>>>
>>>We are however in the sure know that in a few years when cpu's are X times
>>>faster than they are now, that the network latencies from the fastest network
>>>cards (myrilnet) will still be around 5 us (right now 10 us) to get a single
>>>position.
>>
>>Myrinet is _not_ the "fastest network cards around."  Myrinet is simply
>>a very affordable high-bandwidth low-latency network approach.  Our old
>>cluster has something way faster than myrinet, but it was also a very
>>expensive technology compared to myrinet.  And we certainly don't "know"
>>that network latencies won't improve.  I have no idea why you would even
>>say that, unless you have no idea where networking started (at 2400 bits
>>per second point-to-point, through coax bus-approached ethernet, to the
>>current "star UTP ethernet hubs" thru fast ethernet, thru gigabit and myrinet
>>and cLAN and the like.  Network latencies have been dropping like a brick,
>>compared to memory latency which has been stuck at the same place for almost
>>20 years.
>>
>>
>>
>>>
>>>Not to mention the even bigger problem and that's effeciently splitting all
>>>cpu's and getting them to work in such a way that your branching factor doesn't
>>>suffer.
>>
>>That is a problem.  But a problem is not necessarily unsolvable, just because
>>you can't solve it, or I haven't solved it.  I'm not giving up on anything
>>yet, speaking only for myself.
>>
>>
>>>
>>>Search of chessprograms in 2003 are having a too good branching factor simply to
>>>let a huge cluster (all supercomputers are basically clusters when having a huge
>>>number of cpu's; even the T3E's latencies with opterons get expressed in
>>>microseconds there) compensate for that by raw speed.
>>>
>>>Additionally those clusters/supercomputers carry processors which are x times
>>>slower than the fastest pc processors. So you always start at another
>>>disadvantage, which some years ago was simply not there.
>>
>>The origin is not exactly based on _today's_ technology.  That was an old
>>machine in terms of today's technology.  Not all such machines are based on
>>old technology...
>>
>>
>>>
>>>>On December 21, 2003 at 20:51:19, Bob Durrett wrote:
>>>>
>>>>i am writing a huge technical report for the government about diep at the
>>>>supercomputer. because 512 processors is so much faster than 1, it takes a huge
>>>>time to calculate the speedups.
>>>>
>>>>practical spoken it was worst case about 2 ply deeper in middlegame roughly than
>>>>a fast pc would have given me (dual opteron). Considering 1 cpu was already 10
>>>>times slower practically than a pc processor is, that is pretty good.
>>
>>8-10x faster than a PC is not very impressive.  But 80-100X speedup on 512
>>processors is not half bad, assuming I extrapolated correctly from your
>>numbers...
>>
>>But why compare such a lump of coal computer to a fast opteron in the first
>>place?
>>
>>
>>>>
>>>>Some positions it performed real well. Peek was 9.9 mln nps, for diep this is
>>>>very good. a single processor gives in opening 22k nps and it scaled about
>>>>40-60% at the supercomputer sequential (so practically losing from that 22k nps
>>>>about 11k nps).
>>>>
>>>>And that at tournament time control, which nowadays is a lot quicker than some
>>>>years ago, i do not consider a bad thing.
>>>>
>>>>Zugzwang and similar programs like cilkchess ran at similar machines and had
>>>>despite having a smaller evaluation function, a lot less nps at such machines.
>>>>
>>>>Zugzwang at a faster latency T3E with 512 alpha's 500Mhz (that's what memory
>>>>tells me, all these guys are good in NOT reacting onto emails i ship them)
>>>>peeked around 2 million nodes a second, i remember it very well.
>>>>
>>>>Diep's logfiles just like the huge report i'm writing will be public available.
>>>>That means that everyone who asks after them will get it. The logfiles you can
>>>>already get if you want to.
>>>>
>>>>The testresults i had will be in the report and the entire outputs are publicly
>>>>available too.
>>>>
>>>>The thing missing is single cpu runs to recalculate testresults which were
>>>>obtained by overnight runs.
>>>>
>>>>One of my conclusions will surprise some researchers (not all of them) and the
>>>>government will dislike it. So it will be carefully written down.
>>>>
>>>>Best regards,
>>>>Vincent
>>>>
>>>>> Hi Vincent.
>>>>>
>>>>>Have you had a chance to analyze the GRAZ performance data for your
>>>>>chess-playing program yet?
>>>>>
>>>>>What can you tell us about your technical findings?  Did you achieve the goals
>>>>>you had set for the GRAZ event?
>>>>>
>>>>>Someone said they thought you did not have the opportunity to test your software
>>>>>on the full-up computer hardware prior to the GRAZ event.  If so, did that have
>>>>>an impact?
>>>>>
>>>>>Your findings should shed light on the technical aspects of running chess
>>>>>programs on very large multiprocessor computers [>>8 processors].
>>>>>
>>>>>Bob D.

This sounds like the discourse between to scientists competing for a Nobel
Prize!  : )

Bob D.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.