Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DTS article robert hyatt - revealing his bad math

Author: Vincent Diepeveen

Date: 10:45:09 09/03/02

Go up one level in this thread


On September 03, 2002 at 13:31:15, Uri Blass wrote:

there is 24 positions x 1,2,4,8,16 processors.
so there is pretty much data. You can see it in
first icca from 1997 where bob describes DTS.

If i claim an average speedup of 1.90 then there is
a domain of about 1.85 - 1.949 where the speedups
fall in. However bob's speedups all fall in only
1/10 of it. So for every rounded number there is a
1/10 chance.

A few numbers where they have .01 that's a round off
error usually. He modified his data a little but not
enough to get outside the error margin of statistical
analysis on data.

In short for every round number there is about a 1/10 chance
to happen.

1/10^(24*3.5) is about 1 / 10^30

>On September 03, 2002 at 13:22:45, Uri Blass wrote:
>
>>On September 03, 2002 at 12:58:23, Vincent Diepeveen wrote:
>>
>>>On September 03, 2002 at 12:54:05, Matthew Hull wrote:
>>>
>>>>On September 03, 2002 at 12:33:12, Vincent Diepeveen wrote:
>>>>
>>>>>On September 03, 2002 at 12:28:55, Matthew Hull wrote:
>>>>>
>>>>>>On September 03, 2002 at 11:56:48, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>We all know how many failures the past years parallel programs have been
>>>>>>>when developed by scientists. This years diep show at the teras was no
>>>>>>>exception to that. The 3 days preparation time i had to get
>>>>>>>to the machine (and up to 5 days before tournament
>>>>>>>i wasn't sure whether i would get system time *anyway*).
>>>>>>>
>>>>>>>However sponsors want to hear how well your thing did. At a 1024
>>>>>>>processor machine (maximum allocation 512 processors within 1 partition
>>>>>>>of shared memory) from which you get 60 with bandwidth of the memory
>>>>>>>2 times slower than local ram, and let's not even *start* to discuss
>>>>>>>the latency otherwise you will never start to fear diep using that
>>>>>>>machine. All i can say about it is that the 20 times slowed down
>>>>>>>Zugzwang was at 1999 at a machine with faster latency...
>>>>>>>
>>>>>>>I'm working hard now to get a DIEP DTS NUMA version ready.
>>>>>>>
>>>>>>>DTS it is because it is dynamic splitting wherever it wants to.
>>>>>>>
>>>>>>>Work for over a month fulltime has been done now. Tests at a dual K7
>>>>>>>as well as dual supercomputer processors have been very positive.
>>>>>>>
>>>>>>>Nevertheless i worried about how to report about it. So i checked out the
>>>>>>>article from Robert Hyatt again. Already in 1999 when i had implemented
>>>>>>>a pc-DTS version i wondered why i never got near the speeds of bob
>>>>>>>when i was not forward pruning other than nullmove. The 1999 world champs
>>>>>>>version i had great speedups, but i could all explain them by forward
>>>>>>>pruning which i was using at the time.
>>>>>>>
>>>>>>>Never i got close even dual xeon or quad xeon to speeds reported by Bob
>>>>>>>in his DTS version described 1997. I concluded that it had to do with
>>>>>>>a number of things, encouraged by Bob's statements. In 99 bob explained
>>>>>>>that splitting was very cheap at the cray. He copied a block with all
>>>>>>>data of 64KB from processor 0 to P1 within 1 clock at the cray.
>>>>>>>
>>>>>>>I didn't know much of crays or supercomputers at the time, except that
>>>>>>>they were out of my budget so i believed it. However i have a good memory
>>>>>>>for certain numbers, so i have remembered his statement very well.
>>>>>>>
>>>>>>>In 2002 Bob explained the cray could copy 16 bytes each clock. A
>>>>>>>BIG contradiction to his 1999 statement. No one here will wonder
>>>>>>>about that, because regarding deep blue we have already seen hundreds
>>>>>>>of contradicting statements from bob. Anyway, that makes
>>>>>>>splitting at the cray of course very expensive, considering bob copied
>>>>>>>64KB data for each split. Crafty is no exception here.
>>>>>>>
>>>>>>>I never believed the 2.0 speedup in his tabel at page 16 for 2 processors,
>>>>>>>because if i do a similar test i sometimes get also > 2.0, usually less.
>>>>>>>
>>>>>>>Singular extensiosn hurted diep's speedup incredible, but even today
>>>>>>>i cannot get within a few minutes get to the speedup bob achieved in
>>>>>>>his 1997 article.
>>>>>>>
>>>>>>>In 1999 i wondered about why his speedup was so good.
>>>>>>>So Bob concluded he splitted in a smarter way when i asked.
>>>>>>>Then i asked obviously how he splitted in cray blitz, because
>>>>>>>what bob is doing in crafty is too horrible for DIEP to get a speedup
>>>>>>>much above 1.5 anyway. I asked obviously how he splitted in cray blitz.
>>>>>>>
>>>>>>>The answer was: "do some statistical analysis yourself on game trees
>>>>>>>to find a way to split well it can't be hard, i could do it too in
>>>>>>>cray blitz but my source code is gone. No one has it anymore".
>>>>>>>
>>>>>>>So you can feel my surprise when he suddenly had data of crafty versus
>>>>>>>cray blitz after 1999, which bob quotes till today into CCC to proof how
>>>>>>>well his thing was.
>>>>>>>
>>>>>>>Anyway, i can analyze games as FM, so i already knew a bit about how well
>>>>>>>this cray blitz was. I never paid much attention to the lies of bob here.
>>>>>>>
>>>>>>>I thought he was doing this in order to save himself time digging up old
>>>>>>>source code.
>>>>>>>
>>>>>>>Now after a month of fulltime work at DIEP at the supercomputer and having
>>>>>>>it working great at a dual (and very little overhead) but still a bad
>>>>>>>speedup i started worrying about my speedup and future article to write
>>>>>>>about it.
>>>>>>>
>>>>>>>So a possible explanation for the bad speedup of todays software when compared
>>>>>>>to bob's thing in 1993 and writing about it in 1997 is perhaps explained
>>>>>>>by nullmove. Bob still denies this despite a lot of statistical data
>>>>>>>at loads of positions (150 positions in total tried) with CRAFTY even.
>>>>>>>
>>>>>>>Bob doesn't find that significant results. Also he says that not a
>>>>>>>single of MY tests is valid because i have a stupid PC with 2 processors
>>>>>>>and bad RAM. a dual would hurt crafties performance too much.
>>>>>>>
>>>>>>>This because i concluded also that the speedup crafty gets here
>>>>>>>is between 1.01 and 1.6 and not 1.7.
>>>>>>>
>>>>>>>Data suggests that crafties speedup at his own quad is about 2.8,
>>>>>>>where he claims 3.1.
>>>>>>>
>>>>>>>Then bob referred back to his 1997 thesis that the testmethod wasn't good.
>>>>>>>Because to get that 2.8 we used cleared hashtables and in his thesis he
>>>>>>>cheats a little by not clearing the tables at all. to simulate a game
>>>>>>>playing environment that's ok of course.
>>>>>>>
>>>>>>>However there is a small problem with his article. The search times and
>>>>>>>speedup numbers are complete fraud. If i divide the times of 1 cpu by
>>>>>>>the speedup bob claims he has, i get perfect numbers nearly.
>>>>>>>
>>>>>>>Here is the result for the first 10 positions based upon bob's article
>>>>>>>march 1997 in icca issue #1 that year, the tables with the results
>>>>>>>are on page 16:
>>>>>>>
>>>>>>>When diep searches at a position it is always a weird number.
>>>>>>>If i claim a speedup of 1.8 then it is usually 1.7653 or 1.7920 or 1.8402
>>>>>>>and so on. Not with bob. Bob knows nothing from statistical analysis
>>>>>>>of data (i must claim innocent here too but i am at least not STUPID
>>>>>>>like bob here):
>>>>>>>
>>>>>>>pos   2      4      8   16
>>>>>>>1  2.0000 3.40   6.50   9.09
>>>>>>>2  2.00   3.60   6.50  10.39
>>>>>>>3  2.0000 3.70   7.01  13.69
>>>>>>>4  2.0000 3.90   6.61  11.09
>>>>>>>5  2.0000 3.6000 6.51   8.98876
>>>>>>>6  2.0000 3.70   6.40   9.50000
>>>>>>>7  1.90   3.60   6.91  10.096
>>>>>>>8  2.000  3.700  7.00  10.6985
>>>>>>>9  2.0000 3.60   6.20   9.8994975 = 9.90
>>>>>>>10 2.000  3.80   7.300 13.000000000000000
>>>>>>>
>>>>>>>This clearly PROOFS that he has cheated completely about all
>>>>>>>search times from 1 processor to 8 processors. Of course
>>>>>>>now that i am running myself at supercomputers i know what is
>>>>>>>the problem. I only needed a 30 minute look a month ago
>>>>>>>to see what is in crafty the problem and most likely that was
>>>>>>>in cray blitz also the problem. The problem is that crafty
>>>>>>>copies 44KB data or so (cray blitz 64KB) and while doing that
>>>>>>>it is using smp_lock. That's too costly with more than 2 cpu's.
>>>>>>>
>>>>>>>This shows he completely lied about his speedups. All times
>>>>>>>from 1-8 cpu's are complete fraud.
>>>>>>>
>>>>>>>There is however also evidence he didn't compare the same
>>>>>>>versions. Cray Blitz node counts are also weird.
>>>>>>>
>>>>>>>The more processors you use the more overhead you have obviously.
>>>>>>>Please don't get mad at me for calculating it in the next simple
>>>>>>>but very convincing way. I will do it only for his first node
>>>>>>>counts at 1..16 cpu's, the formula is:
>>>>>>>  (nodes / speedup_i-cpu's ) * speedup_i+1_cpu's
>>>>>>>
>>>>>>>1 to 2 cpu's we don't need the math.
>>>>>>>If you need exactly 2 times shorter to get to it but
>>>>>>>thereby you need more nodes at more cpu's (where you need
>>>>>>>expensive splits) then that's already weird of course, though
>>>>>>>not impossible.
>>>>>>>
>>>>>>>2 to 4 cpu's:
>>>>>>> 3.4 * (89052012 / 2.0) = 151388420.4 nodes.
>>>>>>>  bob needed: 105.025.123 which in itself is possible.
>>>>>>>  Simply like 40% overhead extra for 4 processors which 2 do
>>>>>>>  not have. This is very well possible.
>>>>>>>
>>>>>>>4 to 8 cpu's:
>>>>>>>  6.5 * 105025123 nodes / 3.4 = 200.783.323
>>>>>>>  bob needed: 109MLN nodes
>>>>>>>  That means at 8 cpu's the overhead is already approaching
>>>>>>>  100% rapidly. This is very well possible. The more cpu's
>>>>>>>  the bigger the overhead.
>>>>>>>
>>>>>>>8 to 16 cpu's:
>>>>>>>  9.1 * (109467495 / 6.5) = 153254493
>>>>>>>  bob needed: 155.514.410
>>>>>>>
>>>>>>>My dear fellow programmers. This is impossible.
>>>>>>>
>>>>>>>Where is the overhead?
>>>>>>>
>>>>>>>The factor 100% at least overhead?
>>>>>>>
>>>>>>>More likely factor 3 overhead.
>>>>>>>
>>>>>>>The only explanation i can come up with is that the node counts
>>>>>>>from 2..8 processors are created by a different version from
>>>>>>>Cray Blitz than the 16 processor version.
>>>>>>>
>>>>>>>From the single cpu version we already know the number of nodes gotta
>>>>>>>be weird because it is using a smaller hashtable (see page 4.1 in the
>>>>>>>article second line there after 'testing methodology').
>>>>>>>
>>>>>>>We talk about mass fraud here.
>>>>>>>
>>>>>>>Of course it is 5 years ago this article and i do not know whether
>>>>>>>he created the table in 1993.
>>>>>>>
>>>>>>>How am i going to tell my sponsor that my speedup won't be the same
>>>>>>>as that from the 1997 article? To whom do i compare, zugzwang?
>>>>>>>'only' had on paper 50% speedup out of 512 processors. Of course also
>>>>>>>something which is not realistic. However Feldmann documented most of
>>>>>>>the things he did in order to cripple zugzwang to get a better speedup.
>>>>>>>
>>>>>>>A well known trick is to kick out nullmove and only use normal alfabeta
>>>>>>>instead of PVS or other forms of search. Even deep blue did that :)
>>>>>>>
>>>>>>>But what do you guys think from this alternative book keeping from Bob?
>>>>>>>
>>>>>>>Best regards,
>>>>>>>Vincent
>>>>>>
>>>>>>
>>>>>>It sounds like you are saying in effect, "If I cannot duplicate Bob's
>>>>>>performance numbers with DIEP, then Bob's claims are false".
>>>>>
>>>>>No. please look at the data.
>>>>>
>>>>>There is a 1 / 10^30 chance you get such data.
>>>>>
>>>>>In short he has made up the data. The search times he has 'invented'
>>>>>himself.
>>>
>>>I am not talking about my machine here. i am talking about the
>>>fraud committed by bob.
>>>
>>>pos   2      4      8   16
>>>1  2.0000 3.40   6.50   9.09
>>>2  2.00   3.60   6.50  10.39
>>>3  2.0000 3.70   7.01  13.69
>>>4  2.0000 3.90   6.61  11.09
>>>5  2.0000 3.6000 6.51   8.98876
>>>6  2.0000 3.70   6.40   9.50000
>>>7  1.90   3.60   6.91  10.096
>>>8  2.000  3.700  7.00  10.6985
>>>9  2.0000 3.60   6.20   9.8994975 = 9.90
>>>10 2.000  3.80   7.300 13.000000000000000
>>>
>>>There is a chance smaller than 1/10^30 that 'by accident' such
>>>numbers happen. that's 0.0000000000000000000000000000001
>>>with about 30 zero's before the 1 happens.
>>
>>I do not think that the probability is 1/10^30.
>>I guess that the 13 is based on times.
>>If the numbers are based on time in 1/1000 seconds then it is possible.
>>
>>You may get 737/1000 seconds with 16 processors and exactly 737*13/1000 seconds
>>in one processor.
>>
>>This is rare but not so rare to be impossible.
>>
>>If you choose a random number for the 1 processor you have a probability of
>>1/737 to fget similiar behaviour.
>>
>>Uri
>
>get and not fget(sorry for the writing error).
>
>I can add that it is not surprising when you have a lot of positions that one
>number is .0000000 and 737 units of time for the 16 processors was only
>a guess and it can be even less.
>
>I do not understand nothing about the data but the numbers are not uniformly
>distributed if you get them by dividing of 2 integers.
>
>Uri



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.