Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: x86-64 ASM output of a simple test program

Author: Dezhi Zhao

Date: 18:38:48 09/30/05

Go up one level in this thread


PUBLIC	?diagonal1Attacks@@YA_K_K0@Z			; diagonal1Attacks
; Function compile flags: /Ogtpy
;	COMDAT ?diagonal1Attacks@@YA_K_K0@Z
_TEXT	SEGMENT
deo$ = 8
pro$ = 16
?diagonal1Attacks@@YA_K_K0@Z PROC			; diagonal1Attacks, COMDAT

; 36   : 	return leftDownAttacks(deo, pro) | rightUpAttacks(deo, pro);

	mov	r10, rdx
	mov	r9, rcx
	mov	r8, rcx
	shr	r9, 9
	mov	rax, 9187201950435737471		; 7f7f7f7f7f7f7f7fH
	and	r10, rax
	and	r9, r10
	mov	rax, r10
	shr	rax, 9
	or	r9, rcx
	shl	rcx, 9
	and	r10, rax
	mov	rax, r9
	shr	rax, 18
	and	rax, r10
	or	r9, rax
	mov	rax, -72340172838076674			; fefefefefefefefeH
	and	rdx, rax
	and	rcx, rdx
	mov	rax, rdx
	shl	rax, 9
	or	rcx, r8
	and	rdx, rax
	mov	rax, rcx
	shl	rax, 18
	and	rax, rdx
	or	rcx, rax
	mov	rax, rcx
	and	rax, -2057				; fffffffffffff7f7H
	shl	rax, 18
	and	rax, rdx
	shl	rax, 18
	and	rax, rdx
	mov	rdx, -141289400074369			; ffff7f7f7f7f7f7fH
	and	rcx, rdx
	mov	rdx, r9
	or	rax, rcx
	shr	rdx, 36					; 00000024H
	mov	rcx, r10
	shr	rcx, 18
	shl	rax, 9
	and	rdx, rcx
	mov	rcx, 35887507618889599			; 007f7f7f7f7f7f7fH
	and	rdx, r10
	or	rdx, r9
	shr	rdx, 9
	and	rdx, rcx
	or	rax, rdx

; 37   : }

	ret	0
?diagonal1Attacks@@YA_K_K0@Z ENDP			; diagonal1Attacks
_TEXT	ENDS

On September 30, 2005 at 12:24:50, Gerd Isenberg wrote:

>
>>Yes, it looks fine. I prefer to use one more volatile register (r8/r9) to hold
>>the manifest constant and save one load operation.
>>
>
>hmm... yes, probably due to the long opcode of loading a immediate 64-bit
>constant into a register. Otoh even if r8/r9 are volatile registers, if not used
>by a leaf-callee a caller using those registers may not save/resore it.
>I also have the impression that msvc is a bit conserative using architectual
>registers - may be for a good reason.
>
>And in this special case, the second constant is changed to a pre mask value
>(ffff7f7f7f7f7f7fH) by some nice optimization process, see the shift left 36.
>
>As already mentioned, if more directions are processed in parallel with inlined
>Kogge-Stone routines, instructions may hopefully scheduled more parallel with
>let say 2*3 registers...
>
>Gerd
>
>
>typedef unsigned __int64 BitBoard;
>
>BitBoard notA = 0xfefefefefefefefe;
>BitBoard notH = 0x7f7f7f7f7f7f7f7f;
>
>// deo - bishops/queens, pro - set of empty squares
>
>__forceinline
>BitBoard rightUpAttacks(BitBoard deo, BitBoard pro) {
> pro  = pro &  notA;
> deo |= pro & (deo <<  9);
> pro  = pro & (pro <<  9);
> deo |= pro & (deo << 18);
> pro  = pro & (pro << 18);
> deo |= pro & (deo << 36);
> return (deo << 9) & notA;
>}
>
>__forceinline
>BitBoard leftDownAttacks(BitBoard deo, BitBoard pro) {
> pro  = pro &  notH;
> deo |= pro & (deo >>  9);
> pro  = pro & (pro >>  9);
> deo |= pro & (deo >> 18);
> pro  = pro & (pro >> 18);
> deo |= pro & (deo >> 36);
> return (deo >> 9) & notH;
>}
>
>BitBoard diagonal1Attacks(BitBoard deo, BitBoard pro) {
>  return leftDownAttacks(deo, pro) | rightUpAttacks(deo, pro);
>}



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.