Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Opteron Instruction Set

Author: Vincent Diepeveen

Date: 12:59:43 02/03/04

Go up one level in this thread


On February 03, 2004 at 15:02:42, Gerd Isenberg wrote:

>On February 03, 2004 at 11:45:20, Vincent Diepeveen wrote:
>
>>On February 03, 2004 at 03:13:29, Gerd Isenberg wrote:
>>
>>>On February 03, 2004 at 01:03:29, Jay Urbanski wrote:
>>>
>>>>On February 02, 2004 at 22:41:19, Robert Hyatt wrote:
>>>>
>>>>>On February 02, 2004 at 20:06:29, David Rasmussen wrote:
>>>>>
>>>>>>Does the Opteron have firstBit, lastBit and popCount instructions? Or at least
>>>>>>something that makes calculating them easier than on x86-32?
>>>>>>
>>>>>>/David
>>>>>
>>>>>
>>>>>Has the same BSF/BSR instructions, but no popcnt that I have found.  Note
>>>>>that BSF/BSR work on 64 bit values if you want.  I have inline asm to do
>>>>>all three for gcc if you are interested.
>>>>
>>>>I understand there is a popcount instruction.  I also understand it's
>>>>undocumented.
>>>
>>>Do you have any opcode or further hints?
>>>That would be great - a 4 cycle vector path popcount ;-)
>>
>>And deadslow.
>
>Yes Vincent, if it exists, 4 is quite too optimistic. I guess it is more in a
>range of 10-40 cycles, bsf is 9. And doing up to four popcounts in parallel as i
>often do with MMX and/or general purpose is probably faster than using 4
>deadslow vector path instructions in a row.
>
>My current SSE2 favourite is:
>
>MASKMOVDQU xmmreg1, xmmreg2 66h 0Fh F7h VectorPath ~ 43 cycles latency ;-)
>(implements a masked conditional write of up to 16 bytes).
>
>But a very interesting SSE2 instruction for eval purposes is:
>
>PMADDWD Packed Multiply Words and Add Doublewords
>Eight 16*16 muls and four 32-bit adds in 4 cycles (double dispatch as most SSE2
>instructions):
>
>c0 = a0*b0 + a1*b1
>c1 = a2*b2 + a3*b3
>c2 = a4*b4 + a5*b5
>c3 = a6*b6 + a7*b7

Is each 16 bits word a real word, so signed integer [-32768..32767]?
Or is it unsigned integer?

If you want to write a graphics program it no doubt for a certain application
will be great there to have something that is similar to that. However what you
need for real graphics software is 32 bits floats.

That's more interesting for graphics software.

Yes it's great instructions for those who want to write their software in those
worlds.

However anyone who is in the integer world and writing game tree searching
products must be liking to waste time to SSE* and stuff like PNI (Prescott New
Instructions) :)

Did you see diep tested at prescott already? www.aceshardware.com

Hopefully you optimized your program a little so that it is 10 times faster in
nps than diep. DIEP will be around 500k nps at 8 processors.

See you there.

>See you in Paderborn!
>
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.