Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: MP system info

Author: Vincent Diepeveen

Date: 11:36:36 05/28/02

Go up one level in this thread


On May 28, 2002 at 12:53:44, Robert Hyatt wrote:

>On May 28, 2002 at 10:06:42, Vincent Diepeveen wrote:
>
>>On May 28, 2002 at 09:06:36, K. Burcham wrote:
>>
>>for computerchess that is way too optimistic Kim.
>>
>>programs like Cray Blitz or DIEP might do pretty well at
>>8 processors, but crafty, fritz, sos, shredder, patzer,
>>junior and these programs
>>scale pretty bad at 8 processors.
>
>
>What on earth are you talking about when you mention Crafty?  I have
>run crafty on 16 cpu machines and it works just as well as it does on
>4...  From actual testing, not "speculation".

i'm talking about worst case speedup.

*not* average case or best case. We both know that some
testset positions you could get 100 times speedup with some luck
on a 16 processor.

>
>>
>>bandwidth is not the issue here. speedup is the issue here.
>>
>
>
>For the 8-way machines, bandwidth _is_ the issue.  4-way boxes use
>4-way interleaving to provide enough memory bandwidth for 4 cpus.
>8-way boxes lose in two ways.  (1) they still use 4-way memory
>interleaving;  (2) the cache coherency hardware treats the machine as
>two "clusters" of 4 cpus, making "inter-cluster" cache coherency less
>efficient than on the 4-way clusters...

i didn't know that from 8 processor machines. that makes 8 processor
machines even more interesting.

>
>
>>If you split at random like most of these programs do, then
>>you have simply major speedup problems soon.
>>
>>In case of patzer a big issue is that it is tactical extending
>>a lot, so the search space is not identical (i don't even
>>know whether it runs at 8 processors).
>>
>>So where crafty gets 1.7 speedup at 2 processors and like 2.5 speedup
>>at 4 processors at crucial moments (when score drops a little) in
>>the game, there the speedup at 8 processors for these random
>>splitting programs is very horrible at 8 processors;
>
>Crafty runs consistently at > 3.0 speedup at 4 processors.  I have posted
>the data for several positional/tactical test positions that clearly proves
>this.
>
>Creating numbers out of the clear blue simply is not productive.

you can see it even watching the whisper from crafty.
some critical positions it gets 11 ply, then all the other positions
it gets 14 to 15 ply search depth. Everyone who watches
the whispers can do the math already.

>
>>
>>in some positions you get 10 times speedup, in other positions a
>>2 times speedup. When you need the speedup you don't get it.
>
>Perhaps not all programs behave this badly?

Crafty does, but that's logical. you split *at random*. So
if a position goes bad, you split bad. And if you split bad,
chance is statistically higher you again split worse and worse.

If it goes great, then you have a statistic chance it goes even
better with the splitting.

>
>>
>>Anyway this is all theoretic discussion. I am pretty sure chessbase
>>doesn't want to buy a 8 way Xeon system, even though they can afford
>>the $100k easily.
>>
>>With regard to memory i need to mention that memory is faster on
>>these systems than at our slow dual systems (with respect to memory),
>>memory goes in parallel at the big machines, it doesn't at dual
>>machines.
>
>However, the 4-way and 8-way boxes share the _same_ memory system.

I can imagine that at a quad, where diep also has *zero* latency
problems with the memory, that the problems are way smaller than
at a dual 2 Ghz K7.

>
>>
>>Best regards,
>>Vincent
>>
>>>In absolute terms, the 8-way Pentium 3 Xeon systems are only 44% faster than the
>>>4-way ones, which means that with the 4 extra CPUs, the system only gets 1.76
>>>CPUs worth of extra performance, which is poor value for money. This level of
>>>scalability is not that surprising since each group of 4 CPUs share 0.8GByte/s
>>>of memory bandwidth. As a side note, it seems likely though that 252.eon fits
>>>almost perfectly into the 2MByte cache the Pentium 3 Xeons have as it gets
>>>nearly linear scalability - the higher the cache hit rate, the less main memory
>>>is needed, which leaves more for the other CPUs.
>>>
>>>Even worse, in some tests, the 8-way system actually does worse than the 4-way
>>>system, and this could possibly be due to differences in the chipsets or because
>>>the extra contention itself on the shared Pentium system bus causes efficiency
>>>to drop. It's unlikely that the compilers/OS would have made much difference as
>>>for each CPU type the tests were done at similar times with the same compilers.
>>>
>>>
>>>http://www.aceshardware.com/read.jsp?id=45000338
>>>
>>>kburcham



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.