Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DTS article robert hyatt - revealing his bad math

Author: Matthew Hull

Date: 09:54:05 09/03/02

Go up one level in this thread


On September 03, 2002 at 12:33:12, Vincent Diepeveen wrote:

>On September 03, 2002 at 12:28:55, Matthew Hull wrote:
>
>>On September 03, 2002 at 11:56:48, Vincent Diepeveen wrote:
>>
>>>We all know how many failures the past years parallel programs have been
>>>when developed by scientists. This years diep show at the teras was no
>>>exception to that. The 3 days preparation time i had to get
>>>to the machine (and up to 5 days before tournament
>>>i wasn't sure whether i would get system time *anyway*).
>>>
>>>However sponsors want to hear how well your thing did. At a 1024
>>>processor machine (maximum allocation 512 processors within 1 partition
>>>of shared memory) from which you get 60 with bandwidth of the memory
>>>2 times slower than local ram, and let's not even *start* to discuss
>>>the latency otherwise you will never start to fear diep using that
>>>machine. All i can say about it is that the 20 times slowed down
>>>Zugzwang was at 1999 at a machine with faster latency...
>>>
>>>I'm working hard now to get a DIEP DTS NUMA version ready.
>>>
>>>DTS it is because it is dynamic splitting wherever it wants to.
>>>
>>>Work for over a month fulltime has been done now. Tests at a dual K7
>>>as well as dual supercomputer processors have been very positive.
>>>
>>>Nevertheless i worried about how to report about it. So i checked out the
>>>article from Robert Hyatt again. Already in 1999 when i had implemented
>>>a pc-DTS version i wondered why i never got near the speeds of bob
>>>when i was not forward pruning other than nullmove. The 1999 world champs
>>>version i had great speedups, but i could all explain them by forward
>>>pruning which i was using at the time.
>>>
>>>Never i got close even dual xeon or quad xeon to speeds reported by Bob
>>>in his DTS version described 1997. I concluded that it had to do with
>>>a number of things, encouraged by Bob's statements. In 99 bob explained
>>>that splitting was very cheap at the cray. He copied a block with all
>>>data of 64KB from processor 0 to P1 within 1 clock at the cray.
>>>
>>>I didn't know much of crays or supercomputers at the time, except that
>>>they were out of my budget so i believed it. However i have a good memory
>>>for certain numbers, so i have remembered his statement very well.
>>>
>>>In 2002 Bob explained the cray could copy 16 bytes each clock. A
>>>BIG contradiction to his 1999 statement. No one here will wonder
>>>about that, because regarding deep blue we have already seen hundreds
>>>of contradicting statements from bob. Anyway, that makes
>>>splitting at the cray of course very expensive, considering bob copied
>>>64KB data for each split. Crafty is no exception here.
>>>
>>>I never believed the 2.0 speedup in his tabel at page 16 for 2 processors,
>>>because if i do a similar test i sometimes get also > 2.0, usually less.
>>>
>>>Singular extensiosn hurted diep's speedup incredible, but even today
>>>i cannot get within a few minutes get to the speedup bob achieved in
>>>his 1997 article.
>>>
>>>In 1999 i wondered about why his speedup was so good.
>>>So Bob concluded he splitted in a smarter way when i asked.
>>>Then i asked obviously how he splitted in cray blitz, because
>>>what bob is doing in crafty is too horrible for DIEP to get a speedup
>>>much above 1.5 anyway. I asked obviously how he splitted in cray blitz.
>>>
>>>The answer was: "do some statistical analysis yourself on game trees
>>>to find a way to split well it can't be hard, i could do it too in
>>>cray blitz but my source code is gone. No one has it anymore".
>>>
>>>So you can feel my surprise when he suddenly had data of crafty versus
>>>cray blitz after 1999, which bob quotes till today into CCC to proof how
>>>well his thing was.
>>>
>>>Anyway, i can analyze games as FM, so i already knew a bit about how well
>>>this cray blitz was. I never paid much attention to the lies of bob here.
>>>
>>>I thought he was doing this in order to save himself time digging up old
>>>source code.
>>>
>>>Now after a month of fulltime work at DIEP at the supercomputer and having
>>>it working great at a dual (and very little overhead) but still a bad
>>>speedup i started worrying about my speedup and future article to write
>>>about it.
>>>
>>>So a possible explanation for the bad speedup of todays software when compared
>>>to bob's thing in 1993 and writing about it in 1997 is perhaps explained
>>>by nullmove. Bob still denies this despite a lot of statistical data
>>>at loads of positions (150 positions in total tried) with CRAFTY even.
>>>
>>>Bob doesn't find that significant results. Also he says that not a
>>>single of MY tests is valid because i have a stupid PC with 2 processors
>>>and bad RAM. a dual would hurt crafties performance too much.
>>>
>>>This because i concluded also that the speedup crafty gets here
>>>is between 1.01 and 1.6 and not 1.7.
>>>
>>>Data suggests that crafties speedup at his own quad is about 2.8,
>>>where he claims 3.1.
>>>
>>>Then bob referred back to his 1997 thesis that the testmethod wasn't good.
>>>Because to get that 2.8 we used cleared hashtables and in his thesis he
>>>cheats a little by not clearing the tables at all. to simulate a game
>>>playing environment that's ok of course.
>>>
>>>However there is a small problem with his article. The search times and
>>>speedup numbers are complete fraud. If i divide the times of 1 cpu by
>>>the speedup bob claims he has, i get perfect numbers nearly.
>>>
>>>Here is the result for the first 10 positions based upon bob's article
>>>march 1997 in icca issue #1 that year, the tables with the results
>>>are on page 16:
>>>
>>>When diep searches at a position it is always a weird number.
>>>If i claim a speedup of 1.8 then it is usually 1.7653 or 1.7920 or 1.8402
>>>and so on. Not with bob. Bob knows nothing from statistical analysis
>>>of data (i must claim innocent here too but i am at least not STUPID
>>>like bob here):
>>>
>>>pos   2      4      8   16
>>>1  2.0000 3.40   6.50   9.09
>>>2  2.00   3.60   6.50  10.39
>>>3  2.0000 3.70   7.01  13.69
>>>4  2.0000 3.90   6.61  11.09
>>>5  2.0000 3.6000 6.51   8.98876
>>>6  2.0000 3.70   6.40   9.50000
>>>7  1.90   3.60   6.91  10.096
>>>8  2.000  3.700  7.00  10.6985
>>>9  2.0000 3.60   6.20   9.8994975 = 9.90
>>>10 2.000  3.80   7.300 13.000000000000000
>>>
>>>This clearly PROOFS that he has cheated completely about all
>>>search times from 1 processor to 8 processors. Of course
>>>now that i am running myself at supercomputers i know what is
>>>the problem. I only needed a 30 minute look a month ago
>>>to see what is in crafty the problem and most likely that was
>>>in cray blitz also the problem. The problem is that crafty
>>>copies 44KB data or so (cray blitz 64KB) and while doing that
>>>it is using smp_lock. That's too costly with more than 2 cpu's.
>>>
>>>This shows he completely lied about his speedups. All times
>>>from 1-8 cpu's are complete fraud.
>>>
>>>There is however also evidence he didn't compare the same
>>>versions. Cray Blitz node counts are also weird.
>>>
>>>The more processors you use the more overhead you have obviously.
>>>Please don't get mad at me for calculating it in the next simple
>>>but very convincing way. I will do it only for his first node
>>>counts at 1..16 cpu's, the formula is:
>>>  (nodes / speedup_i-cpu's ) * speedup_i+1_cpu's
>>>
>>>1 to 2 cpu's we don't need the math.
>>>If you need exactly 2 times shorter to get to it but
>>>thereby you need more nodes at more cpu's (where you need
>>>expensive splits) then that's already weird of course, though
>>>not impossible.
>>>
>>>2 to 4 cpu's:
>>> 3.4 * (89052012 / 2.0) = 151388420.4 nodes.
>>>  bob needed: 105.025.123 which in itself is possible.
>>>  Simply like 40% overhead extra for 4 processors which 2 do
>>>  not have. This is very well possible.
>>>
>>>4 to 8 cpu's:
>>>  6.5 * 105025123 nodes / 3.4 = 200.783.323
>>>  bob needed: 109MLN nodes
>>>  That means at 8 cpu's the overhead is already approaching
>>>  100% rapidly. This is very well possible. The more cpu's
>>>  the bigger the overhead.
>>>
>>>8 to 16 cpu's:
>>>  9.1 * (109467495 / 6.5) = 153254493
>>>  bob needed: 155.514.410
>>>
>>>My dear fellow programmers. This is impossible.
>>>
>>>Where is the overhead?
>>>
>>>The factor 100% at least overhead?
>>>
>>>More likely factor 3 overhead.
>>>
>>>The only explanation i can come up with is that the node counts
>>>from 2..8 processors are created by a different version from
>>>Cray Blitz than the 16 processor version.
>>>
>>>From the single cpu version we already know the number of nodes gotta
>>>be weird because it is using a smaller hashtable (see page 4.1 in the
>>>article second line there after 'testing methodology').
>>>
>>>We talk about mass fraud here.
>>>
>>>Of course it is 5 years ago this article and i do not know whether
>>>he created the table in 1993.
>>>
>>>How am i going to tell my sponsor that my speedup won't be the same
>>>as that from the 1997 article? To whom do i compare, zugzwang?
>>>'only' had on paper 50% speedup out of 512 processors. Of course also
>>>something which is not realistic. However Feldmann documented most of
>>>the things he did in order to cripple zugzwang to get a better speedup.
>>>
>>>A well known trick is to kick out nullmove and only use normal alfabeta
>>>instead of PVS or other forms of search. Even deep blue did that :)
>>>
>>>But what do you guys think from this alternative book keeping from Bob?
>>>
>>>Best regards,
>>>Vincent
>>
>>
>>It sounds like you are saying in effect, "If I cannot duplicate Bob's
>>performance numbers with DIEP, then Bob's claims are false".
>
>No. please look at the data.
>
>There is a 1 / 10^30 chance you get such data.
>
>In short he has made up the data. The search times he has 'invented'
>himself.

Perhaps if you had a good understanding and experience of Cray architecture,
your statement would have more weight.  But, the supercomputer you are using is
really very different from a Cray.  That much I do know.  You can't expect to
get the same performance with a fundamentally different architecture.

It's the same with the AMD versus XEON memory architecture.  They're not the
same.  XEON with interleaved memory has an advantage here.  Everyone acknowleges
that an AMD is not going to get as good a speed up as a XEON with interleaved
memory, as has been explained countless times already.

The same is true for supercomputers.  The designs and special hardware
advantages differ significantly.

You can't prove a lie by comparing apples to oranges.

>
>I hope you realize that.
>
>It shows very hard he cheated. There is no way to escape statistical
>analysis, even though in computer chess most dudes do not know what it is.
>
>They do not know you can catch fraud with statistical analysis.
>
>Bob sure didn't.
>
>>To an outside observer, this would not necessarily follow.  It remains to the
>>reader to wonder if a person making such a statement is necessarily up to the
>>task.  You might be a great programmer.  You might be journeyman programmer.
>>You might be a sub-par programmer.  How are we to know?
>>
>>I for one cannot simply take your word for it.



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.