Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Amir should use the Quad 1.9 Ghz instead of the 8x 1.6 !

Author: Robert Hyatt
Date: 10:20:09 01/30/03
On January 30, 2003 at 05:05:44, Vincent Diepeveen wrote:

>On January 29, 2003 at 23:31:19, Matthew Hull wrote:
>
>>On January 29, 2003 at 23:20:11, Vincent Diepeveen wrote:
>>
>>>On January 29, 2003 at 12:06:50, Robert Hyatt wrote:
>>>
>>>Bob let me explain to you. DIEP is written for machines which have a bit slower
>>>latency for global memory accesses. whereas the world champs 2002 version wasn't
>>>like that and would probably act like crafty on that 8 processor, the end of
>>>august 2002 versions and further are using a new type of parallellism which
>>>doesn't need much locking. Each processor takes care of itself without hurting
>>>bandwidth while searching too much.
>>>
>>>There is no dead slow global locks which is killing the 8 processor thing of
>>>course.
>>>
>>>therefore it works great for example at cc-NUMA machines and all types of Xeon
>>>machines.
>>
>>
>>Wow dude.  Impressive.  Could you supply some time-to-ply benchmarks for Diep on
>>8-way Xeon vis-a-vis 4-way Xeon.  That would refute the proffessor like nothing
>>else.
>
>right now i have no system time at a 8 way Xeon. i do have numbers for 1 .. n
>processors at cc-NUMA SGI Origin 3800. Where i currently tested basically a lot
>up to 16 cpu's and plan to extend that with short tests to > 32 cpu's.
>
>It is not easy to get a good speedup with a lot of processors. But a good
>speedup is something completely else from a near to lineair speedup in nodes a
>second.
>
>Bob is complaining about his crafty not getting good nps a second. Well we
>already know that from the dual K7. single cpu 1 million nps or something and
>dual only 1.5 MLN nps there or something (forgive me a 100k nps more or less
>here; it just doesn't get a good sequential speedup even at it).

Vincent, your ignorance never ceases to amaze me.  What part of "I ran 8
differnet
applications, including crafty" don't you understand?  What part of "I ran an
application
once, and then I ran it up to 8 times in parallel" don't you understand???


>
>So it is trivial that crafty *won't* run well on 8 cpu Xeon not to mention a
>cc-NUMA machine.
>
>Extending that claim then to other software as well i find a very bad taste.

You would.  Except for the small flaw that I actually _ran_ on the box, with
several different applications.  You've never touched an 8-way xeon in your
life, and you start handwaving about CC-NUMA which has _nothing_ to do
with the 8-way xeon.


>
>Saying it is very very hard to write software that runs well on them is a
>completely different statement from that.

But that is not the statement I wrote.  I said "I found the 8-way box to be only
about 1.5X faster than my quad using _several_ applications.  Only one of which
(Crafty) was actually a parallel algorithm.


>
>Speedup at the machines is a different matter of course. 8 processors for diep
>is the breaking number where the very well division of the x86 algorithm is not
>working that great anymore and where the well working division of the many
>processor splitting is still not working well.
>
>Special tuning would be needed how to split at 8 cpu's when compared to 2..4
>IMHO.
>
>the way inwhich i split at x86 is working pretty ok at 8 cpu's but can be
>optimized further to avoid several milliseconds of cpu's getting idled.
>
>From my viewpoint optimizing the way DIEP splits is done for :
>  2 processors (x86 also works well for 4 processors)
>
>and n processors (n > 16). speedup at 16 processors currently not very good.
>of course that is the *absolute* speedup. No cheated toying like the Feldmann
>group who is focussing upon number of nodes searched instead of searchtimes.
>
>Of course the ideal way to get a 100% speedup with the feldmann way of measuring
>how good their algorithms are is a very bugged program that by accident can only
>use 1 cpu, so that will get a 100% speedup then always whereas in my model it is
>n times slower :)
>
>>Sincerely,
>>Matt
>>
>>
>>>
>>>Now you have some examples of software written for fast latency shared memory
>>>machines and then claim the thing is slower because the software isn't written
>>>for such types of machines?
>>>
>>>That already should give you the answer. Writing parallel programs is 1 thing.
>>>Writing something that works well without inventing numbers yourself is another
>>>thing.
>>>
>>>
>>>>On January 29, 2003 at 11:38:37, Vincent Diepeveen wrote:
>>>>
>>>>>On January 28, 2003 at 10:33:15, Robert Hyatt wrote:
>>>>>
>>>>>>On January 28, 2003 at 09:07:35, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On January 28, 2003 at 03:33:44, Mig Greengard wrote:
>>>>>>>
>>>>>>>>According to the tech I talked with, Amir and Shay were testing both machines
>>>>>>>>before the match to see which one they would use. To my knowledge it wasn't
>>>>>>>>decided until a day or two before the match. Obviously there isn't a big
>>>>>>>>difference in performance.
>>>>>>>>
>>>>>>>>Saludos, Mig
>>>>>>>>http://www.chessninja.com
>>>>>>>
>>>>>>>thanks.
>>>>>>>
>>>>>>>DIEP onto the 8 processor 1.6 would be running 16 processes and speed would
>>>>>>>be about expressed in K7:
>>>>>>>  8 x 1.6 Ghz / 1.4 = 9 Ghz
>>>>>>
>>>>>>
>>>>>>No it wouldn't.  You haven't tried an 8-way intel box yet.  It doesn't scale
>>>>>>nearly as well as the 2-way and 4-way intel boxes do.  The chipset for
>>>>>>supporting 8 cpus is simply not very good...
>>>>>
>>>>>DIEP isn't demanding much bandwidth Bob in case you missed it, it works
>>>>>great on a cc-NUMA machine too.
>>>>
>>>>It demands _enough_ bandwidth.  My comment wasn't only about "crafty" It was
>>>>about the 8-way boxes in general.  I ran on a dell 8450, with 8 700mhz xeon
>>>>processors, and it was about 1.5X faster than my box.  And again, _not_ with
>>>>Crafty.  I ran 8 copies of the same thing on the 8450, 4 copies on the quad,
>>>>and compared the total run times.  The 8450 was only about 50% faster when it
>>>>should be 100% based on clock...
>>>>
>>>>
>>>>
>>>>>
>>>>>>The 8-way box using the same clock speed for the processors will only be about
>>>>>>1.5X faster than the 4-way box, and that doesn't count parallel search overhead
>>>>>>at all.
>>>>>
>>>>>That's not true. It's 8 times faster for good software. Of course there is
>>>>>algorithmic loss but there is no sequential loss unless the software sucks,
>>>>>to say it rude.
>>>>
>>>>Have you ever run on one?  Of course not.  I have.  So your "that's not true"
>>>>is simply nonsense...  There are _plenty_ of good benchmarks that can be used
>>>>to draw conclusions about the 8-way memory bottleneck problem.
>>>>
>>>>It _might_ be 8x faster if you can fit in the L2 cache (this machine had
>>>>2mb of L2 per processor compared to my 1mb on my quad 700).  But if you have
>>>>any memory bandwidth at all, it has a problem.  And a 8-probe hash table is
>>>>more than enough to highlight the problem.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>Doesn't say that it is easy to make software that can handle the latencies.
>>>>>
>>>>>It sure isn't easy to make a chessprogram that is having a good speedup
>>>>>(without a too big sequential loss first like Zugzwang which was slowed down
>>>>>first like 100 times or so in order to then have a decent speedup at like
>>>>>256 processors; 50% speedup even incredible much i would be *very* happy with
>>>>>around 15% already).
>>>>>
>>>>>But it is possible to make.
>>>>>
>>>>>DIEP is such a program that shows it can. DIEP runs like the sun on 8 cpu's
>>>>>(2 nodes quad SGI), even at the slowest partitions (slowest latency speeds
>>>>>are of course at the biggest partitions: 512 cpu partition).
>>>>>
>>>>>A 8 processor Xeon is hell for pc software like Fritz, Junior, Crafty, but it
>>>>>is very good for DIEP.
>>>>>
>>>>>Best regards,
>>>>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.