Author: Gerd Isenberg
Date: 23:21:14 07/14/03
Go up one level in this thread
On July 14, 2003 at 19:38:40, Sune Fischer wrote:
>On July 14, 2003 at 18:59:16, Gerd Isenberg wrote:
>>
>>Read again, the ShiftL45[s] lookup is not necessary.
>>
>>__forceinline
>>BitBoard A1H8Attacks(unsigned int sq) const
>>{
>> return sA1H8Atta[sq]
>> [(*(((BYTE*)&(m_OccuBBA1H8))+((sq-Rank(sq))&7))&0x7e)>>1];
>> // diaindex = (file-rank) & 7
>>}
>>
>>Gerd
>>
>
>Thanks Gerd, I'll think it over.
>
>My transformation is a little different I think, though you also have one shift
>and one AND, you've just replaced the table with some "magic", could be faster I
>guess.
>
>I once heard that working with byte size variables is slower than native
>integers (32 bit) sizes, so I'm not fully convinced 8 bit casting it that much
>faster(?).
>
>-S.
Natural processor wordlength is almost best. But for extracting an aligned byte
from a bitboard i guess movzx does quite well. 64bit shifting is much more
expensive on x86-32.
Gerd
AMD Athlon Processor TM
x86 Code Optimization Guide page 113
Use the MOVZX and MOVSX instructions to zero-extend and sign-extend byte-size
and word-size operands to doubleword length. Typical code for zero extension
that replaces MOVZX, as shown in Example 1 (Avoid), uses more decode and
execution resources than MOVZX. It also has higher latency due to the
superset dependency between the XOR and the MOV which requires a merge
operation.
Example 1 (Avoid):
XOR EAX,EAX
MOV AL,[MEM ]
Example 1 (Preferred):
MOVZX EAX,BYTE PTR [MEM ]
---
MOVZX reg16/32, mreg8 0Fh B6h 11-xxx-xxx DirectPath 1 Cycle Latency
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.