Author: Robert Hyatt
Date: 09:53:44 05/28/02
Go up one level in this thread
On May 28, 2002 at 10:06:42, Vincent Diepeveen wrote: >On May 28, 2002 at 09:06:36, K. Burcham wrote: > >for computerchess that is way too optimistic Kim. > >programs like Cray Blitz or DIEP might do pretty well at >8 processors, but crafty, fritz, sos, shredder, patzer, >junior and these programs >scale pretty bad at 8 processors. What on earth are you talking about when you mention Crafty? I have run crafty on 16 cpu machines and it works just as well as it does on 4... From actual testing, not "speculation". > >bandwidth is not the issue here. speedup is the issue here. > For the 8-way machines, bandwidth _is_ the issue. 4-way boxes use 4-way interleaving to provide enough memory bandwidth for 4 cpus. 8-way boxes lose in two ways. (1) they still use 4-way memory interleaving; (2) the cache coherency hardware treats the machine as two "clusters" of 4 cpus, making "inter-cluster" cache coherency less efficient than on the 4-way clusters... >If you split at random like most of these programs do, then >you have simply major speedup problems soon. > >In case of patzer a big issue is that it is tactical extending >a lot, so the search space is not identical (i don't even >know whether it runs at 8 processors). > >So where crafty gets 1.7 speedup at 2 processors and like 2.5 speedup >at 4 processors at crucial moments (when score drops a little) in >the game, there the speedup at 8 processors for these random >splitting programs is very horrible at 8 processors; Crafty runs consistently at > 3.0 speedup at 4 processors. I have posted the data for several positional/tactical test positions that clearly proves this. Creating numbers out of the clear blue simply is not productive. > >in some positions you get 10 times speedup, in other positions a >2 times speedup. When you need the speedup you don't get it. Perhaps not all programs behave this badly? > >Anyway this is all theoretic discussion. I am pretty sure chessbase >doesn't want to buy a 8 way Xeon system, even though they can afford >the $100k easily. > >With regard to memory i need to mention that memory is faster on >these systems than at our slow dual systems (with respect to memory), >memory goes in parallel at the big machines, it doesn't at dual >machines. However, the 4-way and 8-way boxes share the _same_ memory system. > >Best regards, >Vincent > >>In absolute terms, the 8-way Pentium 3 Xeon systems are only 44% faster than the >>4-way ones, which means that with the 4 extra CPUs, the system only gets 1.76 >>CPUs worth of extra performance, which is poor value for money. This level of >>scalability is not that surprising since each group of 4 CPUs share 0.8GByte/s >>of memory bandwidth. As a side note, it seems likely though that 252.eon fits >>almost perfectly into the 2MByte cache the Pentium 3 Xeons have as it gets >>nearly linear scalability - the higher the cache hit rate, the less main memory >>is needed, which leaves more for the other CPUs. >> >>Even worse, in some tests, the 8-way system actually does worse than the 4-way >>system, and this could possibly be due to differences in the chipsets or because >>the extra contention itself on the shared Pentium system bus causes efficiency >>to drop. It's unlikely that the compilers/OS would have made much difference as >>for each CPU type the tests were done at similar times with the same compilers. >> >> >>http://www.aceshardware.com/read.jsp?id=45000338 >> >>kburcham
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.