Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: New crap statement ? Perpetuum mobile

Author: Miguel A. Ballicora

Date: 07:35:47 10/07/01

Go up one level in this thread


On October 05, 2001 at 15:26:44, Robert Hyatt wrote:

[snip]

>>There are several levels of discussion here and there were jumps from one level
>>to another. That made the discussion impossible to follow.
>>
>>Level 1) given a problem and a sequential algorithm (accepted as good), once the
>>algorithm is "parallelized" it is possible to observe a constant superlinear
>>speedup. "parallelized" means to adapt the sequential algorithm so it will work
>>in parallel (two threads with double the resources). Of course, technically now
>>is a different algorithm, but let's say that changes were kept at a minimum no
>>to break the spirit of the original sequential algorithm.
>
>That was the original premise...
>
>
>>
>>Level 2a) Once this is observed, it is possible to change the original
>>sequential algorithm to make it work better, learning from the parallel version.
>>However, it is possible that this modification introduces new code that will
>>make an overhead that is worse than the overhead caused by doubling the
>>resources. Even though the modified sequential version is better than the
>>original, it is possible that cannot remove completely the superlinear speedup
>>leaving a residual but measurable difference.
>
>that last sentence is incorrect.  It is absolutely guaranteed that the super-
>linear speedup will be removed, leaving _at best_ no super-linear speedup,
>and more likely, a deficit.  Running a dual processor has inherent overhead
>that a single processor does not.  Shared bus.  Shared memory with conflicts.
>The necessity of using atomic locks that can be totally avoided in the
>serial implementation of the parallel algorithm.
>>
>>Level 2b)
>>Even if rewriting the sequential version removes the superlinear speed up making
>>it 2.0, the rewriting could be extremely inconvenient and in practice, it is
>>better lo live with that. Considering that Level (3) might solve the issue.
>
>That is not an issue _at all_.  There is no limit placed on "how hard it might
>be to rewrite.  For the case I have mentioned, it is trivial to search two
>different parts of the tree using one kernel process.  I debugged like that
>for years with Cray Blitz.

>>Level 3) Since rewriting the sequential algorithm might not speed it up
>>or is very inconvenient, a suitable alternative could be to time-slice the
>>CPU and run the parallelized version in one processor making the speed exactly
>>2.0 However, a balance of the overheads might make the speed up just above 2.0
>

[snip]

>>So, these are the levels of the discussion and I believe i did not forget
>>anyone. I believe that 1, 2a and 2b cannot be proven impossible.
>>Regarding 3, Bob guarantees that the hardware of the future will never have
>>those characteristics. I cannot read the future so I won't discuss this point
>>any further and I will stick with 1 and 2 for now.
>>
>>Regards,
>>Miguel

[snip]

>I am waiting for someone to give an algorithm example that supports either
>1, 2a or 2b happening.  I happen to know they are impossible.  Using today's

The point is the existance of a demonstration that 1 is impossible. The fact
that you did not see it is not exactly the point. Anyway, I think that I can
find one example of 1. At least I'll try. First, would you consider that the
following is a good sequential algorithm?

Suppose that I have a "virtual cave" with tunnels shaped like a tree.
At the end of one of the tunnels, there is key (trasure). The goal is to
find it. So, each node is split in two (this is fork) branches.  Only one leaf
has the key, all the others are empty.
Would you consider a good algorithm to traverse the tree recursively, first
to the right, when I find an empty leaf (or subtree) I return and I continue
on the left branch until I searched the whole tree or find the key?

Regards,
Miguel

>hardware.  Because I have measured the two kinds of overhead since it is an
>important consideration in measuring parallel performance.  And once you know
>that the "time-slicing" overhead is less on all existing machines we are
>discussing, there is no real point in drifting off to "what if the hardware
>could do this?..." type discussions.
>
>super-linear is simply a myth for the average case.  For the exceptional case,
>it is unusual but not non-existant.  But average case, there will never be any
>sort of super-linear speedup unless there is an intentional attempt to take
>advantage of a hardware issue.  Today there is no real hardware issue that can
>be taken advantage of without it being obvious.  IE 2x L2 cache, etc.  And
>_none_ of those issues apply to alpha/beta, as the tree is too large to make
>that an issue.  Most chess programs won't even fit completely into L2, much
>less their search space data.
>
>
>
>
>
>>
>>>
>>>bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.