Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Crap statement refuted about parallel speedup

Author: Robert Hyatt
Date: 07:41:36 09/25/01
On September 24, 2001 at 22:48:03, Vincent Diepeveen wrote:

>On September 23, 2001 at 22:59:00, Robert Hyatt wrote:
>
>>On September 23, 2001 at 17:46:28, Vincent Diepeveen wrote:
>>
>>>
>>>This is utterly nonsense. i spent hundreds of programming hours trying
>>>to improve move ordering. My move ordering is *better* and more code
>>>than most programs. Probably more than them all. Of course i NEED to
>>>because DIEP's evaluation function is *way* bigger than anything elses
>>>evaluation function, with exception of the average chessplayer (already
>>>half a century ago the number of patterns a masterclass player has, has been
>>>estimated at 100000).
>>>
>>>It is of course way easier to dynamically get a better tree in parallel
>>>than write zillions of source code lines which btw also slow down the program!
>>>
>>>One thing all you guys seem to miss, and which is *impossible* that it
>>>gives Bob the same speedup.
>>>  a) more extensions means that both processors search less in the same
>>>     space so speedup is less then
>>>
>>>So the default diep version has a bad speedup, like 1.8 and also at slower
>>>time controls that doesn't seem to get better, but when i throw all
>>>dangerous extensions outside of it, then the speedup *does* get better.
>>>
>>>Note that results from programs like crafty can never get taken serious
>>>because of futility pruning. The influence at the tree at bigger
>>>depths from futility is *way* bigger than that of a parallel search.
>>>
>>>Note that for computerchess any todays P3 would kick the hell out of that
>>>Crafty YMP from 15 years ago. And no it is *not* equivalent to 200Mhz
>>>even.
>>>
>>>I have some old paper from Bob here describing the Cray Blitz hardware
>>>from some years ago. Let me find it!
>>
>>You don't need to find it.  In 1986 we were running on a machine called the
>>"cray Y-MP".  Clock speed was 6 nanoseconds per cpu.  which is nearly 200mhz.
>>It had 8 processors.  And it could execute at least 2 instructions every clock
>>cycle per cpu...  which would blow off any 200mhz machine you care to drag up.
>
>May i beg your pardon, from Pentium pro and on the cpu's can do
>3 a clock.


Exactly how much floating point are you doing in your program?  So make that
two integer operations per clock _max_.  Not _average_.

The Crays could do integer vectors, integer (64 bit) scalars, _and_ integer
(32 bit) operations all at the same clock cycle.  We had some assembly functions
that had _zero_ wait conditions in them...  they executed the max number of
instructions in a non-blocked stream that was possible.  Our attack function
was one...



>
>But be my guest. You claim average speedup of 11.1 at 16 processors Cray.
>
>That's 11.1 x 0.2Ghz = 2.22Ghz.


Try again.  the 200mhz was not the C90.  And the C90 was not the fastest
machine when that was written.  The T90 was running, we just couldn't get
enough time on it for those ridiculous 1 processor searches.



>
>That's 2 instructions a clock at most.

No it isn't.  You can start a single vector instruction, then feed in two
parallel instruction streams of 64 and 32 bit integer operations.  While that
vector instruction produces two results every clock cycle.  If you chain
vector operations together, it is easy to get 10 results every clock cycle.
And I said _results_... not just raw instructions...



>
>I do not need bandwidth for my chessprogram. I am happy that nowadays
>processors do 3 instructions a clock at most. P4 even on paper 4,
>but they have proven theoretically that current design cannot do it
>yet (unless intel has not told the entire truth about the processor
>which is very well possible too).
>
>Note that 21264 does 4 at most.
>
>I do not know how much penalties each misprediction got in these days,
>that's not interesting anyway, because there are tools to measure
>the average number of instructions that execute each clock.
>
>For crafty that's around 2 at nowadays cpu's. So a modern cpu definitely
>isn't slower for it.


Yes it is.  We typically produced 5 results per clock on the C90, using the
hardware monitor stuff.  That was why we wrote so much assembly for the program.



>
>So any of todays dual AMDs easily kicks with 2 x 1.4Ghz = 2.8 Ghz
>a 2.22Ghz 16 processor YMP from a year or 15 ago which costed back
>then 80 million dollars and perhaps another 10 million dollar for power,
>support personal and cooling water statoin.

You are mixing things up badly.  YMP=6ns clock, 8 processors, C90 = 4ns
clock, 16 cpus, T90 = 2ns clock, 32 cpus.  And then there was the 1ns cray-3
but we never had a chance to run on that for any tournaments.



>
>there's more about your hardware which has to do with complex instructions.
>
>I will come back onto this!


Don't forget that not only was the cray fast, it had some instructions that
made many of the chess engine parts run faster due to both speed, _and_ needing
far fewer instructions.  Our "in check" function was one example.  We didn't
use bitmaps, yet we could ask the "in check" question _very_ rapidly...



>
>>That old YMP had so much more memory bandwidth than the PCs of today it really
>>doesn't make sense to compare them.
>>
>>
>>
>>>
>>>
>>>
>>>Best regards,
>>>Vincent
>>>
>>>>Dave
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.