Author: Gerd Isenberg
Date: 04:06:22 06/15/04
Go up one level in this thread
On June 15, 2004 at 05:56:05, Gian-Carlo Pascutto wrote:
>On June 14, 2004 at 16:15:17, Dieter Buerssner wrote:
>
>>I collected some suggested routines. See for the source code later. My results
>>on P4 2.53 GHz:
>>
>>With Gcc 3.1
>>
>>Testing static __inline__ FindFirst_GCP
>>sum: 89.909 s for 805305600 calls, 111.646 ns/call
>>With MSVC 7.0 free command line tools
>>Testing static __forceinline FindFirst_GCP
>>sum: 159.459 s for 805305600 calls, 198.011 ns/call
>
>Oi oi oi...seems like 64 bit handling by 32 bit compilers
>still sucks.
>
>Anyway clearly my routine should not be used on a 32 bit PC.
For sure!
If i look to imul/mul latencies, decode type in the various optimization
manuals:
AMD64 imul/mul:
IMUL reg16/32/64, mreg16/32/64 0Fh AFh 11-xxx-xxx DirectPath 3/3/4
MUL mreg32 F7h 11-100-xxx Double 3
MUL mreg64 F7h 11-100-xxx Double 5
so one 64-bit mul 4 cycles Direct Path.
AMD32 imul/mul:
IMUL reg16/32, mreg16/32 0Fh AFh 11-xxx-xxx VectorPath 3/4
MUL mreg32 F7h 11-100-xxx VectorPath 6
P4 imul/mul:
Instruction Latency 1 Throughput Execution Unit
IMUL r32 14 3 FP_MUL
IMUL 15-18 5
MUL 14-18 5
the X86-32 assembly of your routine by my generator:
const BitBoard magic = 0x03f08c5392f756cd; // the 125856687.
const unsigned int magictable[64] =
{
0, 1, 12, 2, 13, 22, 17, 3,
14, 33, 23, 36, 18, 58, 28, 4,
62, 15, 34, 26, 24, 48, 50, 37,
19, 55, 59, 52, 29, 44, 39, 5,
63, 11, 21, 16, 32, 35, 57, 27,
61, 25, 47, 49, 54, 51, 43, 38,
10, 20, 31, 56, 60, 46, 53, 42,
9, 30, 45, 41, 8, 40, 7, 6,
};
unsigned int bitScan (BitBoard b) {
return magictable[((b&-b)*magic) >> 58];
}
?bitScan@@YAI_K@Z PROC NEAR ; bitScan, COMDAT
; 23 : return magictable[((b&-b)*magic) >> 58];
00000 8b 44 24 04 mov eax, DWORD PTR _b$[esp-4]
00004 8b 4c 24 08 mov ecx, DWORD PTR _b$[esp]
00008 8b d0 mov edx, eax
0000a 56 push esi
0000b f7 da neg edx
0000d 8b f1 mov esi, ecx
0000f 68 53 8c f0 03 push 66096211 ; 03f08c53H
00014 83 d6 00 adc esi, 0
00017 23 d0 and edx, eax
00019 f7 de neg esi
0001b 23 f1 and esi, ecx
0001d 68 cd 56 f7 92 push -1829284147 ; 92f756cdH
00022 56 push esi
00023 52 push edx
00024 e8 00 00 00 00 call __allmul
00029 c1 ea 1a shr edx, 26 ; 0000001aH
0002c 5e pop esi
0002d 8b 04 95 00 00
00 00 mov eax, DWORD PTR _magictable[edx*4]
; 24 : }
00034 c3 ret 0
?bitScan@@YAI_K@Z ENDP ; bitScan
the 64-bit mul routine conditionally performs up to three 32-bit muls:
__allmul
004013D0 8B 44 24 08 mov eax,dword ptr [esp+8]
004013D4 8B 4C 24 10 mov ecx,dword ptr [esp+10h]
004013D8 0B C8 or ecx,eax
004013DA 8B 4C 24 0C mov ecx,dword ptr [esp+0Ch]
004013DE 75 09 jne 004013E9
004013E0 8B 44 24 04 mov eax,dword ptr [esp+4]
004013E4 F7 E1 mul eax,ecx
004013E6 C2 10 00 ret 10h
004013E9 53 push ebx
004013EA F7 E1 mul eax,ecx
004013EC 8B D8 mov ebx,eax
004013EE 8B 44 24 08 mov eax,dword ptr [esp+8]
004013F2 F7 64 24 14 mul eax,dword ptr [esp+14h]
004013F6 03 D8 add ebx,eax
004013F8 8B 44 24 08 mov eax,dword ptr [esp+8]
004013FC F7 E1 mul eax,ecx
004013FE 03 D3 add edx,ebx
00401400 5B pop ebx
00401401 C2 10 00 ret 10h
>
>--
>GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.