Author: Robert Hyatt
Date: 14:18:43 10/04/01
Go up one level in this thread
On October 04, 2001 at 14:22:09, J. Wesley Cleveland wrote: >On October 04, 2001 at 14:03:13, Robert Hyatt wrote: > >>On October 04, 2001 at 12:40:53, J. Wesley Cleveland wrote: >> >>>Take an iterative function F(a), i.e. F1(a) = a, F2(a) = F(a), F3(a) = F(F(a)), >>>... Assume the function is complicated enough to use all the available registers >>>on your machine and requires the same number of machine cycles for each >>>iteration. Now find the first iteration for which F(b) > F(a). A serial >>>implementation will spend a significant amount of time saving and restoring >>>state as it goes between the two functions, while the two processor solution >>>will only have to spend a minimal amount of time syncronizing at the end of each >>>iteration (due to my second condition). >> >> >>Explain how you are going to do a recursive function (I assume you mean >>recursive and not iterative above due to the way you defined the thing) >>in a parallel manner. > >I wrote recursive, but it could be implemented either way. In the text above I see "iterative" not "recursive" > >The parallel part is that one processor calculates f1(a), f2(a), ... >while the other processor calculates f1(b), f2(b), ... In the text above you assume some finite number of registers. And you assume that on the machine you use, that is an issue. I choose to use a SPARC. Which means that two different functions use two different sets of registers. And that advantage evaporates. Or I run it on a MIPS and use 32 registers rather than 8. That kind of thing is pointing out a limit in the _hardware_, not a super-linear speedup in the parallel _algorithm_. The _algorithm_ is 100% independent of any hardware constraints. And that is what we are talking about here. Alpha/Beta certainly doesn't have that kind of register situation. > > It would be impossible to calculate f2(a) until you >>have successfully calculated f1(1). That is a purely sequential algorithm >>that has no parallel solution at all. Any more than you can parallelize >>the following loop: >> >>a[0]=127; >>for (i=1;i<n;i++) >> >> a[i]=a[i-1]*f(i); >> >>because you can't compute a[i] for any i until a[i-1] has been calculated. >>This is a classic order dependency. I don't see how you could parallelize >>your function for that same reason, unless I misunderstood what you are asking. >> >>Also, if you run out of registers on any machine, you have a minor problem. But >>if you run out on a serial algorithm, you will also run out on a parallel >>algorithm. Since they are written the same way at that level... > >But you have 2 sets of registers for your 2 processors, so each processor can >keep its function in registers. So you are assuming some unique architectural restriction? Then you could also propose a program on a machine that only has 128MB of ram, vs a two processor machine with 256mb of ram. And you have 256mb of data. You are introducing a non-compatible variable into the discussion. When you discuss algorithms and parallel algorithms, the architectural issues don't count. Of course a machine with twice as much memory, or twice as many registers, would tend to be faster than another with less of either or both. But that has nothing to do with the _algorithm_.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.