Author: Gerd Isenberg
Date: 13:33:26 07/24/04
Go up one level in this thread
On July 24, 2004 at 14:45:11, Gerd Isenberg wrote:
>On July 24, 2004 at 04:24:06, Volker Böhm wrote:
>
>>Hi Gerd,
>>
>>I am allways impressed by your algorithms. Are they usable on a PIII
>>architecture like PIII itself, PIII-Celeron or or Pentium M (Centrino)?
>>
>>Greetings Volker
>
>Hi Volker,
>
>thanks for your kind words. Not sure about Centrino. PIII has no SSE
>instructions as far as i know, but mmx with eight 64-bit registers and similar
>instructions. There is an mmx pmaxsw instruction with four 32-bit signed ints as
>well.
>
>Gerd
The mmx-version is even faster on my Athlon64 2.2GHz box, 7.4ns!
That demonstrates the relative poor SSE2-performance due to double versus direct
path instructions and 64-bit alus and give an idea what is possible if SSE2-Alus
became 128-bit wide.
It is really a shame, that win64 for AMD64 does not support this instruction set
anymore. The eight MMX register shared with x87 are not longer saved/restored
during context switch. So if you like to do multi-threading ;-(
Gerd
int getMaxOf64(short int s64[] /* aligned 8 */)
{
__asm
{
mov eax, [s64]
movq mm1, [eax+ 0*8]
movq mm0, [eax+ 1*8]
pmaxsw mm1, [eax+ 2*8]
pmaxsw mm0, [eax+ 3*8]
pmaxsw mm1, [eax+ 4*8]
pmaxsw mm0, [eax+ 5*8]
pmaxsw mm1, [eax+ 6*8]
pmaxsw mm0, [eax+ 7*8]
pmaxsw mm1, [eax+ 8*8]
pmaxsw mm0, [eax+ 9*8]
pmaxsw mm1, [eax+10*8]
pmaxsw mm0, [eax+11*8]
pmaxsw mm1, [eax+12*8]
pmaxsw mm0, [eax+13*8]
pmaxsw mm1, [eax+14*8]
pmaxsw mm0, [eax+15*8]
pmaxsw mm0, mm1
movq mm1, mm0
psrlq mm0, 16
pmaxsw mm0, mm1
movq mm1, mm0
psrlq mm0, 32
pmaxsw mm0, mm1
movd eax, mm0
cwde
}
}
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.