Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Opteron Instruction Set

Author: Gerd Isenberg

Date: 12:02:42 02/03/04

Go up one level in this thread


On February 03, 2004 at 11:45:20, Vincent Diepeveen wrote:

>On February 03, 2004 at 03:13:29, Gerd Isenberg wrote:
>
>>On February 03, 2004 at 01:03:29, Jay Urbanski wrote:
>>
>>>On February 02, 2004 at 22:41:19, Robert Hyatt wrote:
>>>
>>>>On February 02, 2004 at 20:06:29, David Rasmussen wrote:
>>>>
>>>>>Does the Opteron have firstBit, lastBit and popCount instructions? Or at least
>>>>>something that makes calculating them easier than on x86-32?
>>>>>
>>>>>/David
>>>>
>>>>
>>>>Has the same BSF/BSR instructions, but no popcnt that I have found.  Note
>>>>that BSF/BSR work on 64 bit values if you want.  I have inline asm to do
>>>>all three for gcc if you are interested.
>>>
>>>I understand there is a popcount instruction.  I also understand it's
>>>undocumented.
>>
>>Do you have any opcode or further hints?
>>That would be great - a 4 cycle vector path popcount ;-)
>
>And deadslow.

Yes Vincent, if it exists, 4 is quite too optimistic. I guess it is more in a
range of 10-40 cycles, bsf is 9. And doing up to four popcounts in parallel as i
often do with MMX and/or general purpose is probably faster than using 4
deadslow vector path instructions in a row.

My current SSE2 favourite is:

MASKMOVDQU xmmreg1, xmmreg2 66h 0Fh F7h VectorPath ~ 43 cycles latency ;-)
(implements a masked conditional write of up to 16 bytes).

But a very interesting SSE2 instruction for eval purposes is:

PMADDWD Packed Multiply Words and Add Doublewords
Eight 16*16 muls and four 32-bit adds in 4 cycles (double dispatch as most SSE2
instructions):

c0 = a0*b0 + a1*b1
c1 = a2*b2 + a3*b3
c2 = a4*b4 + a5*b5
c3 = a6*b6 + a7*b7

See you in Paderborn!

Gerd




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.