Author: Robert Hyatt
Date: 12:12:40 10/05/01
Go up one level in this thread
On October 05, 2001 at 14:21:43, Miguel A. Ballicora wrote: >On October 05, 2001 at 13:37:19, Robert Hyatt wrote: > >>On October 05, 2001 at 13:27:04, Miguel A. Ballicora wrote: >> >>>On October 05, 2001 at 00:53:07, Robert Hyatt wrote: >>> >>>>>>compare that to a machine with twice the resources that will run that >>>>>>algorithm well. That isn't what this algorithm analysis is about. It is _very_ >>>>>>safe to assume that the dual-cpu machine and the single-cpu-running-two-threads >>>>>>cases are _identical_ in their overhead costs from the machine perspective. >>>>>>Which means it can be ignored... >>>>> >>>>>I do not see why it is safe. Particularly, when you don't know how the >>>>>hardware is going to be in 10 years. 10 years ago branch misprediction >>>>>was not an issue, today is. Can you guarantee that ignoring that overhead >>>>>is safe in the future generation of computers? yes or no?. >>>>> >>>>>Miguel >>>> >>>>I can guarantee you that future machines will have _worse_ memory bottlenecks. >>>>And worse bus conflicts. That will always be more expensive than context >>>>switching. New cpus will have zero context switching overhead as they will >>>>begin to do threading inside the processor... >>> >>>Then, regarding this point it all comes to how well you predict the architecture >>>of the processors of the future. >>> >>>Miguel >> >>Not really. I claim, and this is based on actual tests, that the parallel >>processing overhead will get worse, while the context-switching overhead will >>become smaller. Based solely on what chip designers are doing now for future >>generation chips. >> >>But even if these two overheads remain static, it doesn't change the argument >>about the super-linear discussion at all. Everything must remain static except >>for the addition of a second processor, for this comparison to be reasonable. >>That is a common flaw in many reported parallel speedups. IE the Waycool guys >>at CalTech added another processor _and_ another block of memory. And they >>didn't separate the gain from the additional processor from the gain from the >>additional hash table size... > >Wait just a second! you are not doubling the resources then?? then you >do not duplicate the worker, you give him another brain but not double arms >and legs. Ok. >Anyway, this means that if you double all the resources you admit >that a super linear speed up is possible (please see level (1) >in my other post)? yes or no? Let's agree in the question first. > >Miguel NO one disagrees on that. We are talking about two processors vs one processor, _everything_ else is the same. In the case of cpus, sure you get 8 registers on one intel cpu, 16 on another. you get one cache on one cpu, two on a dual. You get one TLB on one, two on a dual. But that is all. Those don't have any influence on the _algorithm_. But more memory definitely does. Just take any program and run it with a 4M hash table, then an 8M hash table. you will see the difference even though you use only a one-processor machine. So, back to the original discussion: "if you take a sequential (serial) algorithm, and run it on a machine with one processor, and then take that algorithm and turn it into a parallel algorithm and run it on identical machine except with two processors rather than one, then a super-linear speedup is impossible on a representative collection of test data sets. super-linear is always possible on some test cases assuming that the basic sequential algorithm is not optimal. And since we know that our move ordering is not perfect, serendipity will occasionally let a parallel algorithm improve the move ordering and produce a super-linear result. But _not_ on the typical cases. If this happens, then it means that there is a better serial algorithm as well, namely the time-slicing approach I have mentioned multiple times. There is one trivial exception that has to do with specifically writing code that will result in a 'register jam' or whatever on a specific processor. One trivial example would be to take one of my quad xeon processors, with 1M of L2 cache and run an algorithm that has 2M of memory data to pass over, over and over. Of _course_ it will run more than 2x faster on a dual, because the parallel algorithm might be able to load 1/2 the data into one cache, the other half into the other. But then again, if I go buy a xeon with 2M of L2, then that disappears and we are back to square zero. In short, architectural features do not count. Particularly when we know the characteristics of the algorithms being used and we _know_ that such effects are not important in the case we are interested in, namely alpha/beta minimax search." So that is the "context" for any valid super-linear speedup discussion. It is not about trying to fabricate an algorithm or test case that runs out of hardware resources on the single but not on the dual. IE there are plenty of machines with 32 general registers. And they won't suffer like the intel processors do with only 8. And then there are the crays with hundreds of registers... > > >> >>Here we fix _everything_ except the number of cpus...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.