Author: Vincent Diepeveen
Date: 10:45:09 09/03/02
Go up one level in this thread
On September 03, 2002 at 13:31:15, Uri Blass wrote: there is 24 positions x 1,2,4,8,16 processors. so there is pretty much data. You can see it in first icca from 1997 where bob describes DTS. If i claim an average speedup of 1.90 then there is a domain of about 1.85 - 1.949 where the speedups fall in. However bob's speedups all fall in only 1/10 of it. So for every rounded number there is a 1/10 chance. A few numbers where they have .01 that's a round off error usually. He modified his data a little but not enough to get outside the error margin of statistical analysis on data. In short for every round number there is about a 1/10 chance to happen. 1/10^(24*3.5) is about 1 / 10^30 >On September 03, 2002 at 13:22:45, Uri Blass wrote: > >>On September 03, 2002 at 12:58:23, Vincent Diepeveen wrote: >> >>>On September 03, 2002 at 12:54:05, Matthew Hull wrote: >>> >>>>On September 03, 2002 at 12:33:12, Vincent Diepeveen wrote: >>>> >>>>>On September 03, 2002 at 12:28:55, Matthew Hull wrote: >>>>> >>>>>>On September 03, 2002 at 11:56:48, Vincent Diepeveen wrote: >>>>>> >>>>>>>We all know how many failures the past years parallel programs have been >>>>>>>when developed by scientists. This years diep show at the teras was no >>>>>>>exception to that. The 3 days preparation time i had to get >>>>>>>to the machine (and up to 5 days before tournament >>>>>>>i wasn't sure whether i would get system time *anyway*). >>>>>>> >>>>>>>However sponsors want to hear how well your thing did. At a 1024 >>>>>>>processor machine (maximum allocation 512 processors within 1 partition >>>>>>>of shared memory) from which you get 60 with bandwidth of the memory >>>>>>>2 times slower than local ram, and let's not even *start* to discuss >>>>>>>the latency otherwise you will never start to fear diep using that >>>>>>>machine. All i can say about it is that the 20 times slowed down >>>>>>>Zugzwang was at 1999 at a machine with faster latency... >>>>>>> >>>>>>>I'm working hard now to get a DIEP DTS NUMA version ready. >>>>>>> >>>>>>>DTS it is because it is dynamic splitting wherever it wants to. >>>>>>> >>>>>>>Work for over a month fulltime has been done now. Tests at a dual K7 >>>>>>>as well as dual supercomputer processors have been very positive. >>>>>>> >>>>>>>Nevertheless i worried about how to report about it. So i checked out the >>>>>>>article from Robert Hyatt again. Already in 1999 when i had implemented >>>>>>>a pc-DTS version i wondered why i never got near the speeds of bob >>>>>>>when i was not forward pruning other than nullmove. The 1999 world champs >>>>>>>version i had great speedups, but i could all explain them by forward >>>>>>>pruning which i was using at the time. >>>>>>> >>>>>>>Never i got close even dual xeon or quad xeon to speeds reported by Bob >>>>>>>in his DTS version described 1997. I concluded that it had to do with >>>>>>>a number of things, encouraged by Bob's statements. In 99 bob explained >>>>>>>that splitting was very cheap at the cray. He copied a block with all >>>>>>>data of 64KB from processor 0 to P1 within 1 clock at the cray. >>>>>>> >>>>>>>I didn't know much of crays or supercomputers at the time, except that >>>>>>>they were out of my budget so i believed it. However i have a good memory >>>>>>>for certain numbers, so i have remembered his statement very well. >>>>>>> >>>>>>>In 2002 Bob explained the cray could copy 16 bytes each clock. A >>>>>>>BIG contradiction to his 1999 statement. No one here will wonder >>>>>>>about that, because regarding deep blue we have already seen hundreds >>>>>>>of contradicting statements from bob. Anyway, that makes >>>>>>>splitting at the cray of course very expensive, considering bob copied >>>>>>>64KB data for each split. Crafty is no exception here. >>>>>>> >>>>>>>I never believed the 2.0 speedup in his tabel at page 16 for 2 processors, >>>>>>>because if i do a similar test i sometimes get also > 2.0, usually less. >>>>>>> >>>>>>>Singular extensiosn hurted diep's speedup incredible, but even today >>>>>>>i cannot get within a few minutes get to the speedup bob achieved in >>>>>>>his 1997 article. >>>>>>> >>>>>>>In 1999 i wondered about why his speedup was so good. >>>>>>>So Bob concluded he splitted in a smarter way when i asked. >>>>>>>Then i asked obviously how he splitted in cray blitz, because >>>>>>>what bob is doing in crafty is too horrible for DIEP to get a speedup >>>>>>>much above 1.5 anyway. I asked obviously how he splitted in cray blitz. >>>>>>> >>>>>>>The answer was: "do some statistical analysis yourself on game trees >>>>>>>to find a way to split well it can't be hard, i could do it too in >>>>>>>cray blitz but my source code is gone. No one has it anymore". >>>>>>> >>>>>>>So you can feel my surprise when he suddenly had data of crafty versus >>>>>>>cray blitz after 1999, which bob quotes till today into CCC to proof how >>>>>>>well his thing was. >>>>>>> >>>>>>>Anyway, i can analyze games as FM, so i already knew a bit about how well >>>>>>>this cray blitz was. I never paid much attention to the lies of bob here. >>>>>>> >>>>>>>I thought he was doing this in order to save himself time digging up old >>>>>>>source code. >>>>>>> >>>>>>>Now after a month of fulltime work at DIEP at the supercomputer and having >>>>>>>it working great at a dual (and very little overhead) but still a bad >>>>>>>speedup i started worrying about my speedup and future article to write >>>>>>>about it. >>>>>>> >>>>>>>So a possible explanation for the bad speedup of todays software when compared >>>>>>>to bob's thing in 1993 and writing about it in 1997 is perhaps explained >>>>>>>by nullmove. Bob still denies this despite a lot of statistical data >>>>>>>at loads of positions (150 positions in total tried) with CRAFTY even. >>>>>>> >>>>>>>Bob doesn't find that significant results. Also he says that not a >>>>>>>single of MY tests is valid because i have a stupid PC with 2 processors >>>>>>>and bad RAM. a dual would hurt crafties performance too much. >>>>>>> >>>>>>>This because i concluded also that the speedup crafty gets here >>>>>>>is between 1.01 and 1.6 and not 1.7. >>>>>>> >>>>>>>Data suggests that crafties speedup at his own quad is about 2.8, >>>>>>>where he claims 3.1. >>>>>>> >>>>>>>Then bob referred back to his 1997 thesis that the testmethod wasn't good. >>>>>>>Because to get that 2.8 we used cleared hashtables and in his thesis he >>>>>>>cheats a little by not clearing the tables at all. to simulate a game >>>>>>>playing environment that's ok of course. >>>>>>> >>>>>>>However there is a small problem with his article. The search times and >>>>>>>speedup numbers are complete fraud. If i divide the times of 1 cpu by >>>>>>>the speedup bob claims he has, i get perfect numbers nearly. >>>>>>> >>>>>>>Here is the result for the first 10 positions based upon bob's article >>>>>>>march 1997 in icca issue #1 that year, the tables with the results >>>>>>>are on page 16: >>>>>>> >>>>>>>When diep searches at a position it is always a weird number. >>>>>>>If i claim a speedup of 1.8 then it is usually 1.7653 or 1.7920 or 1.8402 >>>>>>>and so on. Not with bob. Bob knows nothing from statistical analysis >>>>>>>of data (i must claim innocent here too but i am at least not STUPID >>>>>>>like bob here): >>>>>>> >>>>>>>pos 2 4 8 16 >>>>>>>1 2.0000 3.40 6.50 9.09 >>>>>>>2 2.00 3.60 6.50 10.39 >>>>>>>3 2.0000 3.70 7.01 13.69 >>>>>>>4 2.0000 3.90 6.61 11.09 >>>>>>>5 2.0000 3.6000 6.51 8.98876 >>>>>>>6 2.0000 3.70 6.40 9.50000 >>>>>>>7 1.90 3.60 6.91 10.096 >>>>>>>8 2.000 3.700 7.00 10.6985 >>>>>>>9 2.0000 3.60 6.20 9.8994975 = 9.90 >>>>>>>10 2.000 3.80 7.300 13.000000000000000 >>>>>>> >>>>>>>This clearly PROOFS that he has cheated completely about all >>>>>>>search times from 1 processor to 8 processors. Of course >>>>>>>now that i am running myself at supercomputers i know what is >>>>>>>the problem. I only needed a 30 minute look a month ago >>>>>>>to see what is in crafty the problem and most likely that was >>>>>>>in cray blitz also the problem. The problem is that crafty >>>>>>>copies 44KB data or so (cray blitz 64KB) and while doing that >>>>>>>it is using smp_lock. That's too costly with more than 2 cpu's. >>>>>>> >>>>>>>This shows he completely lied about his speedups. All times >>>>>>>from 1-8 cpu's are complete fraud. >>>>>>> >>>>>>>There is however also evidence he didn't compare the same >>>>>>>versions. Cray Blitz node counts are also weird. >>>>>>> >>>>>>>The more processors you use the more overhead you have obviously. >>>>>>>Please don't get mad at me for calculating it in the next simple >>>>>>>but very convincing way. I will do it only for his first node >>>>>>>counts at 1..16 cpu's, the formula is: >>>>>>> (nodes / speedup_i-cpu's ) * speedup_i+1_cpu's >>>>>>> >>>>>>>1 to 2 cpu's we don't need the math. >>>>>>>If you need exactly 2 times shorter to get to it but >>>>>>>thereby you need more nodes at more cpu's (where you need >>>>>>>expensive splits) then that's already weird of course, though >>>>>>>not impossible. >>>>>>> >>>>>>>2 to 4 cpu's: >>>>>>> 3.4 * (89052012 / 2.0) = 151388420.4 nodes. >>>>>>> bob needed: 105.025.123 which in itself is possible. >>>>>>> Simply like 40% overhead extra for 4 processors which 2 do >>>>>>> not have. This is very well possible. >>>>>>> >>>>>>>4 to 8 cpu's: >>>>>>> 6.5 * 105025123 nodes / 3.4 = 200.783.323 >>>>>>> bob needed: 109MLN nodes >>>>>>> That means at 8 cpu's the overhead is already approaching >>>>>>> 100% rapidly. This is very well possible. The more cpu's >>>>>>> the bigger the overhead. >>>>>>> >>>>>>>8 to 16 cpu's: >>>>>>> 9.1 * (109467495 / 6.5) = 153254493 >>>>>>> bob needed: 155.514.410 >>>>>>> >>>>>>>My dear fellow programmers. This is impossible. >>>>>>> >>>>>>>Where is the overhead? >>>>>>> >>>>>>>The factor 100% at least overhead? >>>>>>> >>>>>>>More likely factor 3 overhead. >>>>>>> >>>>>>>The only explanation i can come up with is that the node counts >>>>>>>from 2..8 processors are created by a different version from >>>>>>>Cray Blitz than the 16 processor version. >>>>>>> >>>>>>>From the single cpu version we already know the number of nodes gotta >>>>>>>be weird because it is using a smaller hashtable (see page 4.1 in the >>>>>>>article second line there after 'testing methodology'). >>>>>>> >>>>>>>We talk about mass fraud here. >>>>>>> >>>>>>>Of course it is 5 years ago this article and i do not know whether >>>>>>>he created the table in 1993. >>>>>>> >>>>>>>How am i going to tell my sponsor that my speedup won't be the same >>>>>>>as that from the 1997 article? To whom do i compare, zugzwang? >>>>>>>'only' had on paper 50% speedup out of 512 processors. Of course also >>>>>>>something which is not realistic. However Feldmann documented most of >>>>>>>the things he did in order to cripple zugzwang to get a better speedup. >>>>>>> >>>>>>>A well known trick is to kick out nullmove and only use normal alfabeta >>>>>>>instead of PVS or other forms of search. Even deep blue did that :) >>>>>>> >>>>>>>But what do you guys think from this alternative book keeping from Bob? >>>>>>> >>>>>>>Best regards, >>>>>>>Vincent >>>>>> >>>>>> >>>>>>It sounds like you are saying in effect, "If I cannot duplicate Bob's >>>>>>performance numbers with DIEP, then Bob's claims are false". >>>>> >>>>>No. please look at the data. >>>>> >>>>>There is a 1 / 10^30 chance you get such data. >>>>> >>>>>In short he has made up the data. The search times he has 'invented' >>>>>himself. >>> >>>I am not talking about my machine here. i am talking about the >>>fraud committed by bob. >>> >>>pos 2 4 8 16 >>>1 2.0000 3.40 6.50 9.09 >>>2 2.00 3.60 6.50 10.39 >>>3 2.0000 3.70 7.01 13.69 >>>4 2.0000 3.90 6.61 11.09 >>>5 2.0000 3.6000 6.51 8.98876 >>>6 2.0000 3.70 6.40 9.50000 >>>7 1.90 3.60 6.91 10.096 >>>8 2.000 3.700 7.00 10.6985 >>>9 2.0000 3.60 6.20 9.8994975 = 9.90 >>>10 2.000 3.80 7.300 13.000000000000000 >>> >>>There is a chance smaller than 1/10^30 that 'by accident' such >>>numbers happen. that's 0.0000000000000000000000000000001 >>>with about 30 zero's before the 1 happens. >> >>I do not think that the probability is 1/10^30. >>I guess that the 13 is based on times. >>If the numbers are based on time in 1/1000 seconds then it is possible. >> >>You may get 737/1000 seconds with 16 processors and exactly 737*13/1000 seconds >>in one processor. >> >>This is rare but not so rare to be impossible. >> >>If you choose a random number for the 1 processor you have a probability of >>1/737 to fget similiar behaviour. >> >>Uri > >get and not fget(sorry for the writing error). > >I can add that it is not surprising when you have a lot of positions that one >number is .0000000 and 737 units of time for the 16 processors was only >a guess and it can be even less. > >I do not understand nothing about the data but the numbers are not uniformly >distributed if you get them by dividing of 2 integers. > >Uri
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.