Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: FirstOne/LastOne

Author: Gerd Isenberg
Date: 04:06:22 06/15/04
On June 15, 2004 at 05:56:05, Gian-Carlo Pascutto wrote:

>On June 14, 2004 at 16:15:17, Dieter Buerssner wrote:
>
>>I collected some suggested routines. See for the source code later. My results
>>on P4 2.53 GHz:
>>
>>With Gcc 3.1
>>
>>Testing static __inline__ FindFirst_GCP
>>sum:  89.909 s for 805305600 calls, 111.646 ns/call
>>With MSVC 7.0 free command line tools
>>Testing static __forceinline FindFirst_GCP
>>sum: 159.459 s for 805305600 calls, 198.011 ns/call
>
>Oi oi oi...seems like 64 bit handling by 32 bit compilers
>still sucks.
>
>Anyway clearly my routine should not be used on a 32 bit PC.

For sure!

If i look to imul/mul latencies, decode type in the various optimization
manuals:

AMD64 imul/mul:

IMUL reg16/32/64, mreg16/32/64 0Fh AFh 11-xxx-xxx DirectPath 3/3/4
MUL  mreg32                        F7h 11-100-xxx Double     3
MUL  mreg64                        F7h 11-100-xxx Double     5

so one 64-bit mul 4 cycles Direct Path.

AMD32 imul/mul:

IMUL reg16/32, mreg16/32    0Fh AFh 11-xxx-xxx VectorPath    3/4
MUL  mreg32                     F7h 11-100-xxx VectorPath    6

P4 imul/mul:
Instruction Latency 1 Throughput Execution Unit
IMUL r32       14     3          FP_MUL
IMUL        15-18     5
MUL         14-18     5


the X86-32 assembly of your routine by my generator:

const BitBoard magic =  0x03f08c5392f756cd; // the 125856687.

const unsigned int magictable[64] =
{

    0,  1, 12,  2, 13, 22, 17,  3,
   14, 33, 23, 36, 18, 58, 28,  4,
   62, 15, 34, 26, 24, 48, 50, 37,
   19, 55, 59, 52, 29, 44, 39,  5,
   63, 11, 21, 16, 32, 35, 57, 27,
   61, 25, 47, 49, 54, 51, 43, 38,
   10, 20, 31, 56, 60, 46, 53, 42,
    9, 30, 45, 41,  8, 40,  7,  6,
};

unsigned int bitScan (BitBoard b) {
    return magictable[((b&-b)*magic) >> 58];
}

?bitScan@@YAI_K@Z PROC NEAR				; bitScan, COMDAT

; 23   :     return magictable[((b&-b)*magic) >> 58];

  00000	8b 44 24 04	 mov	 eax, DWORD PTR _b$[esp-4]
  00004	8b 4c 24 08	 mov	 ecx, DWORD PTR _b$[esp]
  00008	8b d0		 mov	 edx, eax
  0000a	56		 push	 esi
  0000b	f7 da		 neg	 edx
  0000d	8b f1		 mov	 esi, ecx
  0000f	68 53 8c f0 03	 push	 66096211		; 03f08c53H
  00014	83 d6 00	 adc	 esi, 0
  00017	23 d0		 and	 edx, eax
  00019	f7 de		 neg	 esi
  0001b	23 f1		 and	 esi, ecx
  0001d	68 cd 56 f7 92	 push	 -1829284147		; 92f756cdH
  00022	56		 push	 esi
  00023	52		 push	 edx
  00024	e8 00 00 00 00	 call	 __allmul
  00029	c1 ea 1a	 shr	 edx, 26		; 0000001aH
  0002c	5e		 pop	 esi
  0002d	8b 04 95 00 00
	00 00		 mov	 eax, DWORD PTR _magictable[edx*4]

; 24   : }

  00034	c3		 ret	 0
?bitScan@@YAI_K@Z ENDP					; bitScan

the 64-bit mul routine conditionally performs up to three 32-bit muls:
__allmul

004013D0 8B 44 24 08          mov         eax,dword ptr [esp+8]
004013D4 8B 4C 24 10          mov         ecx,dword ptr [esp+10h]
004013D8 0B C8                or          ecx,eax
004013DA 8B 4C 24 0C          mov         ecx,dword ptr [esp+0Ch]
004013DE 75 09                jne         004013E9
004013E0 8B 44 24 04          mov         eax,dword ptr [esp+4]
004013E4 F7 E1                mul         eax,ecx
004013E6 C2 10 00             ret         10h
004013E9 53                   push        ebx
004013EA F7 E1                mul         eax,ecx
004013EC 8B D8                mov         ebx,eax
004013EE 8B 44 24 08          mov         eax,dword ptr [esp+8]
004013F2 F7 64 24 14          mul         eax,dword ptr [esp+14h]
004013F6 03 D8                add         ebx,eax
004013F8 8B 44 24 08          mov         eax,dword ptr [esp+8]
004013FC F7 E1                mul         eax,ecx
004013FE 03 D3                add         edx,ebx
00401400 5B                   pop         ebx
00401401 C2 10 00             ret         10h



>
>--
>GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.