Computer Chess Club Archives


Search

Terms

Messages

Subject: DTS article robert hyatt - revealing his bad math

Author: Vincent Diepeveen

Date: 08:56:48 09/03/02


We all know how many failures the past years parallel programs have been
when developed by scientists. This years diep show at the teras was no
exception to that. The 3 days preparation time i had to get
to the machine (and up to 5 days before tournament
i wasn't sure whether i would get system time *anyway*).

However sponsors want to hear how well your thing did. At a 1024
processor machine (maximum allocation 512 processors within 1 partition
of shared memory) from which you get 60 with bandwidth of the memory
2 times slower than local ram, and let's not even *start* to discuss
the latency otherwise you will never start to fear diep using that
machine. All i can say about it is that the 20 times slowed down
Zugzwang was at 1999 at a machine with faster latency...

I'm working hard now to get a DIEP DTS NUMA version ready.

DTS it is because it is dynamic splitting wherever it wants to.

Work for over a month fulltime has been done now. Tests at a dual K7
as well as dual supercomputer processors have been very positive.

Nevertheless i worried about how to report about it. So i checked out the
article from Robert Hyatt again. Already in 1999 when i had implemented
a pc-DTS version i wondered why i never got near the speeds of bob
when i was not forward pruning other than nullmove. The 1999 world champs
version i had great speedups, but i could all explain them by forward
pruning which i was using at the time.

Never i got close even dual xeon or quad xeon to speeds reported by Bob
in his DTS version described 1997. I concluded that it had to do with
a number of things, encouraged by Bob's statements. In 99 bob explained
that splitting was very cheap at the cray. He copied a block with all
data of 64KB from processor 0 to P1 within 1 clock at the cray.

I didn't know much of crays or supercomputers at the time, except that
they were out of my budget so i believed it. However i have a good memory
for certain numbers, so i have remembered his statement very well.

In 2002 Bob explained the cray could copy 16 bytes each clock. A
BIG contradiction to his 1999 statement. No one here will wonder
about that, because regarding deep blue we have already seen hundreds
of contradicting statements from bob. Anyway, that makes
splitting at the cray of course very expensive, considering bob copied
64KB data for each split. Crafty is no exception here.

I never believed the 2.0 speedup in his tabel at page 16 for 2 processors,
because if i do a similar test i sometimes get also > 2.0, usually less.

Singular extensiosn hurted diep's speedup incredible, but even today
i cannot get within a few minutes get to the speedup bob achieved in
his 1997 article.

In 1999 i wondered about why his speedup was so good.
So Bob concluded he splitted in a smarter way when i asked.
Then i asked obviously how he splitted in cray blitz, because
what bob is doing in crafty is too horrible for DIEP to get a speedup
much above 1.5 anyway. I asked obviously how he splitted in cray blitz.

The answer was: "do some statistical analysis yourself on game trees
to find a way to split well it can't be hard, i could do it too in
cray blitz but my source code is gone. No one has it anymore".

So you can feel my surprise when he suddenly had data of crafty versus
cray blitz after 1999, which bob quotes till today into CCC to proof how
well his thing was.

Anyway, i can analyze games as FM, so i already knew a bit about how well
this cray blitz was. I never paid much attention to the lies of bob here.

I thought he was doing this in order to save himself time digging up old
source code.

Now after a month of fulltime work at DIEP at the supercomputer and having
it working great at a dual (and very little overhead) but still a bad
speedup i started worrying about my speedup and future article to write
about it.

So a possible explanation for the bad speedup of todays software when compared
to bob's thing in 1993 and writing about it in 1997 is perhaps explained
by nullmove. Bob still denies this despite a lot of statistical data
at loads of positions (150 positions in total tried) with CRAFTY even.

Bob doesn't find that significant results. Also he says that not a
single of MY tests is valid because i have a stupid PC with 2 processors
and bad RAM. a dual would hurt crafties performance too much.

This because i concluded also that the speedup crafty gets here
is between 1.01 and 1.6 and not 1.7.

Data suggests that crafties speedup at his own quad is about 2.8,
where he claims 3.1.

Then bob referred back to his 1997 thesis that the testmethod wasn't good.
Because to get that 2.8 we used cleared hashtables and in his thesis he
cheats a little by not clearing the tables at all. to simulate a game
playing environment that's ok of course.

However there is a small problem with his article. The search times and
speedup numbers are complete fraud. If i divide the times of 1 cpu by
the speedup bob claims he has, i get perfect numbers nearly.

Here is the result for the first 10 positions based upon bob's article
march 1997 in icca issue #1 that year, the tables with the results
are on page 16:

When diep searches at a position it is always a weird number.
If i claim a speedup of 1.8 then it is usually 1.7653 or 1.7920 or 1.8402
and so on. Not with bob. Bob knows nothing from statistical analysis
of data (i must claim innocent here too but i am at least not STUPID
like bob here):

pos   2      4      8   16
1  2.0000 3.40   6.50   9.09
2  2.00   3.60   6.50  10.39
3  2.0000 3.70   7.01  13.69
4  2.0000 3.90   6.61  11.09
5  2.0000 3.6000 6.51   8.98876
6  2.0000 3.70   6.40   9.50000
7  1.90   3.60   6.91  10.096
8  2.000  3.700  7.00  10.6985
9  2.0000 3.60   6.20   9.8994975 = 9.90
10 2.000  3.80   7.300 13.000000000000000

This clearly PROOFS that he has cheated completely about all
search times from 1 processor to 8 processors. Of course
now that i am running myself at supercomputers i know what is
the problem. I only needed a 30 minute look a month ago
to see what is in crafty the problem and most likely that was
in cray blitz also the problem. The problem is that crafty
copies 44KB data or so (cray blitz 64KB) and while doing that
it is using smp_lock. That's too costly with more than 2 cpu's.

This shows he completely lied about his speedups. All times
from 1-8 cpu's are complete fraud.

There is however also evidence he didn't compare the same
versions. Cray Blitz node counts are also weird.

The more processors you use the more overhead you have obviously.
Please don't get mad at me for calculating it in the next simple
but very convincing way. I will do it only for his first node
counts at 1..16 cpu's, the formula is:
  (nodes / speedup_i-cpu's ) * speedup_i+1_cpu's

1 to 2 cpu's we don't need the math.
If you need exactly 2 times shorter to get to it but
thereby you need more nodes at more cpu's (where you need
expensive splits) then that's already weird of course, though
not impossible.

2 to 4 cpu's:
 3.4 * (89052012 / 2.0) = 151388420.4 nodes.
  bob needed: 105.025.123 which in itself is possible.
  Simply like 40% overhead extra for 4 processors which 2 do
  not have. This is very well possible.

4 to 8 cpu's:
  6.5 * 105025123 nodes / 3.4 = 200.783.323
  bob needed: 109MLN nodes
  That means at 8 cpu's the overhead is already approaching
  100% rapidly. This is very well possible. The more cpu's
  the bigger the overhead.

8 to 16 cpu's:
  9.1 * (109467495 / 6.5) = 153254493
  bob needed: 155.514.410

My dear fellow programmers. This is impossible.

Where is the overhead?

The factor 100% at least overhead?

More likely factor 3 overhead.

The only explanation i can come up with is that the node counts
from 2..8 processors are created by a different version from
Cray Blitz than the 16 processor version.

From the single cpu version we already know the number of nodes gotta
be weird because it is using a smaller hashtable (see page 4.1 in the
article second line there after 'testing methodology').

We talk about mass fraud here.

Of course it is 5 years ago this article and i do not know whether
he created the table in 1993.

How am i going to tell my sponsor that my speedup won't be the same
as that from the 1997 article? To whom do i compare, zugzwang?
'only' had on paper 50% speedup out of 512 processors. Of course also
something which is not realistic. However Feldmann documented most of
the things he did in order to cripple zugzwang to get a better speedup.

A well known trick is to kick out nullmove and only use normal alfabeta
instead of PVS or other forms of search. Even deep blue did that :)

But what do you guys think from this alternative book keeping from Bob?

Best regards,
Vincent



This page took 0.05 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.