Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: A simple example of super-linear speedup on real machines

Author: Robert Hyatt

Date: 14:18:43 10/04/01

On October 04, 2001 at 14:22:09, J. Wesley Cleveland wrote:

>On October 04, 2001 at 14:03:13, Robert Hyatt wrote:
>
>>On October 04, 2001 at 12:40:53, J. Wesley Cleveland wrote:
>>
>>>Take an iterative function F(a), i.e. F1(a) = a, F2(a) = F(a), F3(a) = F(F(a)),
>>>... Assume the function is complicated enough to use all the available registers
>>>on your machine and requires the same number of machine cycles for each
>>>iteration. Now find the first iteration for which F(b) > F(a). A serial
>>>implementation will spend a significant amount of time saving and restoring
>>>state as it goes between the two functions, while the two processor solution
>>>will only have to spend a minimal amount of time syncronizing at the end of each
>>>iteration (due to my second condition).
>>
>>
>>Explain how you are going to do a recursive function (I assume you mean
>>recursive and not iterative above due to the way you  defined the thing)
>>in a parallel manner.
>
>I wrote recursive, but it could be implemented either way.

In the text above I see "iterative" not "recursive"

>
>The parallel part is that one processor calculates f1(a), f2(a), ...
>while the other processor calculates f1(b), f2(b), ...

In the text above you assume some finite number of registers.  And you assume
that on the machine you use, that is an issue.  I choose to use a SPARC.
Which means that two different functions use two different sets of registers.
And that advantage evaporates.  Or I run it on a MIPS and use 32 registers
rather than 8.

That kind of thing is pointing out a limit in the _hardware_, not a super-linear
speedup in the parallel _algorithm_.  The _algorithm_ is 100% independent of any
hardware constraints.  And that is what we are talking about here.  Alpha/Beta
certainly doesn't have that kind of register situation.

>
>  It would be impossible to calculate f2(a) until you
>>have successfully calculated f1(1).  That is a purely sequential algorithm
>>that has no parallel solution at all.  Any more than you can parallelize
>>the following loop:
>>
>>a[0]=127;
>>for (i=1;i<n;i++)
>>
>>   a[i]=a[i-1]*f(i);
>>
>>because you can't compute a[i] for any i until a[i-1] has been calculated.
>>This is a classic order dependency.  I don't see how you could parallelize
>>your function for that same reason, unless I misunderstood what you are asking.
>>
>>Also, if you run out of registers on any machine, you have a minor problem.  But
>>if you run out on a serial algorithm, you will also run out on a parallel
>>algorithm.  Since they are written the same way at that level...
>
>But you have 2 sets of registers for your 2 processors, so each processor can
>keep its function in registers.

So you are assuming some unique architectural restriction?  Then you could
also propose a program on a machine that only has 128MB of ram, vs a two
processor machine with 256mb of ram.  And you have 256mb of data.  You are
introducing a non-compatible variable into the discussion.  When you discuss
algorithms and parallel algorithms, the architectural issues don't count.  Of
course a machine with twice as much memory, or twice as many registers, would
tend to be faster than another with less of either or both.  But that has
nothing to do with the _algorithm_.

Re: A simple example of super-linear speedup on real machines J. Wesley Cleveland 16:30:34 10/04/01
- Re: A simple example of super-linear speedup on real machines Robert Hyatt 19:39:58 10/04/01
- Re: A simple example of super-linear speedup on real machines Miguel A. Ballicora 18:36:38 10/04/01
  - Re: A simple example of super-linear speedup on real machines Robert Hyatt 19:41:29 10/04/01

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.