Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64-way Parallel FP Chip

Author: Vincent Diepeveen

Date: 09:21:17 10/15/03

Go up one level in this thread


On October 15, 2003 at 10:41:37, Robert Hyatt wrote:

Bob,

go to www.cray.com and see how many instructions this thing can put through a
clock :)

Nowadays they are 1Ghz and have a 256KB cache too.

But just imagine that they would not be able to do 29 instructions a clock but
just 1 or 2.

Then any x86 processor blows them away at floating point :)

>On October 15, 2003 at 10:01:44, Vincent Diepeveen wrote:
>
>>On October 15, 2003 at 09:32:10, Robert Hyatt wrote:
>>
>>>On October 14, 2003 at 17:29:18, Vincent Diepeveen wrote:
>>>
>>>>On October 14, 2003 at 16:18:28, Robert Hyatt wrote:
>>>>
>>>>>On October 14, 2003 at 14:29:36, Gerd Isenberg wrote:
>>>>>
>>>>>>On October 14, 2003 at 14:15:33, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On October 14, 2003 at 14:13:08, Gerd Isenberg wrote:
>>>>>>>
>>>>>>>>On October 14, 2003 at 10:07:10, Ricardo Gibert wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>http://www.wired.com/news/technology/0,1282,60791,00.html
>>>>>>>>>
>>>>>>>>>Can this be productively used in a chess program?
>>>>>>>>
>>>>>>>>I don't know, simular hardware ressources may be more productive for chess, if
>>>>>>>>implemented as hyperthreading devices. I guess it's a kind of further
>>>>>>>>development of SSE and AltiVec technology. With huge register files
>>>>>>>>(N * 64 * 64|128|256-bit?) and probably SIMD-wise integer instructions
>>>>>>>>(including popcount?) and fast memory interface, i can imagine that it is
>>>>>>>>usefull for a lot of nice things, like some eval passes, e.g. a first square
>>>>>>>>wise and a final scalar product pass. And fill-attack generation, e.g. square
>>>>>>>>wise in all 16 directions with a specialiced dumb fill routine.
>>>>>>>>
>>>>>>>>Gerd
>>>>>>>
>>>>>>>this is just floating point arrays.
>>>>>>
>>>>>>Aha, well may be a matter of interpretation.
>>>>>>I havn't seen any instruction set yet.
>>>>>>
>>>>>>On the other hand, if float and double arithmetic becomes as fast (or faster) as
>>>>>>integer, why not use it for eval purposes?
>>>>>>
>>>>>>Gerd
>>>>>
>>>>>
>>>>>Correct.  We did this on the Cray.  FP was very fast there and it frees
>>>>>up integer registers for addresses and array indices...
>>>>
>>>>That's of course true however at 16 processors of 100Mhz you reached 500k nodes
>>>>a second with cray blitz.
>>>>
>>>>Each Cray processor can issue up to 29 instructions a cycle.
>>>
>>>I have no idea what you are talking about.  Each cray processor can issue
>>>_one_ instruction per cycle.
>>>
>>>however, doing vector stuff, in one cycle the machine can do four memory
>>>reads and two memory writes (8 byte words) per processor.  It can also do
>>>multiple things in one cycle with vector chaining, but it never issues more
>>>than one instruction per cycle per cpu.
>>>
>>>I don't know what data you are looking at, but it is wrong.
>>>
>>>>
>>>>Crafty at a 1.6Ghz K7 which can issue up to 3 instructions a cycle gets 1
>>>>million nodes a second.
>>>>
>>>>So something capable of 100M * 16 * 29 = 46.4G instructions a cycle you get 500k
>>>>nps because it is a vector machine
>>
>>Bob cut the crap.
>>
>>If cray would execute 1 instruction a cycle then the processors would be
>>10 times slower than any other solution.
>
>Vincent, wanna make a bet?  Any amount of money you care to put on it.
>
>The cray issues one instruction per cycle.  Of course, you have _no idea_
>of what a vector machine does and how it does it, so you aren't going to
>understand anything about the machine.  But one instruction per cycle per
>processor is _it_.
>
>You can find this in any good Cray Reference.  I'll be happy to xerox a page
>from the C90 hardware reference manual that gives this info.
>
>Next, do you understand the difference between an _instruction_ and an
>_operation_?  Didn't think so.  The cray has a set of vector instructions
>where _one_ instruction produces multiple results by operating on a vector.
>But it can't _issue_ more than one instruction per cycle.  It is possible that
>by issuing multiple consecutive instructions, you "chain" vector functional
>units together and produce multiple _operations_ per cycle.  But _not_
>multiple instructions.
>
>Why don't you try to talk about something you know something about, if there
>is such a topic?  And stop trying to talk "cray" to someone that has actually
>_used_ them for 20+ years?
>
>>
>>Yet everyone loves crays because they are vector processors which can do up to
>>29 instructions a cycle.
>
>Nope.
>
>One instruction per cycle.  Try this on for size:
>
>Cray Y-MP C90 System Programmer Reference Manual, CSM-0500-000
>
>"A fetch sequence begins immediately and transfers a block of instructions
>from memory to an instruction buffer.  The issue sequence then selects the
>instruction indicated by the program address (P) register, decodes it,
>determines whether the required registers or functional units are available,
>and if so, allows the instruction to be executed.
>
>As the instruction executes, the P register increments, causing a new
>instruction to be selected from the instruction buffer."
>
>The above happens _once_ per processor cycle.
>
>Again, you don't understand what vector processing is all about.
>
>>
>>Even a P5/100 would have been faster than a cray because it can do 2
>>instructions a clock at 100Mhz.
>
>
>So?  How long would it take that P5/100 to execute (say) a floating point
>add?  The cray does one in 3 cycles.  But if it is a vector instruction,
>afther the first result pops out after 3 cycles, the next result pops out
>one cycle later, and this continues until the vector has been completely
>processed.
>
>Can your P5 do one floating add per cycle?  Didn't think so.  After you
>issue several floating point vector instructions (here is an example):
>
>            v0     v1+v2
>            v3     v4+v5
>            v6     v0*v3
>
>after three cycles, we have three instructions being executed, one issued
>per cycle.  after three cycles, the first v0 value is completed and a new
>one is completed every cycle after that.  After 4 cycles, the first v3 value
>is completed and one is completed every cycle after that.  After 8 cycles,
>the first v6 value is completed and one every cycle after that.  From this
>point forward, we are doing two floating adds and one floating multiply
>every clock cycle.  Can your P5 do that?
>
>The cray was _not_ a fast scalar machine.  Again something you don't understand.
>It _is_ one hell of a fast vector machine, if you would only look.
>
>
>
>
>
>
>>
>>You know that a cray can do 29 and i do. So cut this incredible nonsense right
>>here.
>
>You are producing the nonsense.  I just quoted _directly_ from the C90
>manual and that was the machine you were quoting for my 500K nodes per
>second.
>
>
>>
>>If you would have vectorized cray blitz correctly it would have run of course
>>faster than 500k nps. More like 5MLN nps at a 16 processor 100Mhz cray.
>
>you have no idea what "vectorized CB correctly" means, obviously, since you
>don't have a clue what "vectorized" means as you have shown many times over
>the past 8 years.
>
>Grow up and learn to understand before spouting nonsense.
>
>
>>
>>Thank you,
>>Vincent
>>
>>>Again, you make up numbers that have nothing to do with reality.  A Cray
>>>can issue one instruction per cycle.  The C90 I used for the ICCA DTS
>>>article had a clock cycle time of 4.167 nanoseconds, the standard C90 clock
>>>speed.  That is about 250 million instructions per second per processor.  With
>>>16 processors, that is 4 billion, not your mythical 46.4 billion.  How about
>>>you start writing about things you know something about, and stop making stuff
>>>up about things you don't have a clue about?
>>>
>>>>
>>>>Something capable of 4.8G instructions a cycle you get 1 MLN nps because it is a
>>>>x86 processor.
>>>
>>>
>>>Pure garbage calculations don't convince anybody of anything.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.