Author: Gerd Isenberg
Date: 12:02:42 02/03/04
Go up one level in this thread
On February 03, 2004 at 11:45:20, Vincent Diepeveen wrote: >On February 03, 2004 at 03:13:29, Gerd Isenberg wrote: > >>On February 03, 2004 at 01:03:29, Jay Urbanski wrote: >> >>>On February 02, 2004 at 22:41:19, Robert Hyatt wrote: >>> >>>>On February 02, 2004 at 20:06:29, David Rasmussen wrote: >>>> >>>>>Does the Opteron have firstBit, lastBit and popCount instructions? Or at least >>>>>something that makes calculating them easier than on x86-32? >>>>> >>>>>/David >>>> >>>> >>>>Has the same BSF/BSR instructions, but no popcnt that I have found. Note >>>>that BSF/BSR work on 64 bit values if you want. I have inline asm to do >>>>all three for gcc if you are interested. >>> >>>I understand there is a popcount instruction. I also understand it's >>>undocumented. >> >>Do you have any opcode or further hints? >>That would be great - a 4 cycle vector path popcount ;-) > >And deadslow. Yes Vincent, if it exists, 4 is quite too optimistic. I guess it is more in a range of 10-40 cycles, bsf is 9. And doing up to four popcounts in parallel as i often do with MMX and/or general purpose is probably faster than using 4 deadslow vector path instructions in a row. My current SSE2 favourite is: MASKMOVDQU xmmreg1, xmmreg2 66h 0Fh F7h VectorPath ~ 43 cycles latency ;-) (implements a masked conditional write of up to 16 bytes). But a very interesting SSE2 instruction for eval purposes is: PMADDWD Packed Multiply Words and Add Doublewords Eight 16*16 muls and four 32-bit adds in 4 cycles (double dispatch as most SSE2 instructions): c0 = a0*b0 + a1*b1 c1 = a2*b2 + a3*b3 c2 = a4*b4 + a5*b5 c3 = a6*b6 + a7*b7 See you in Paderborn! Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.