Author: Gerd Isenberg
Date: 11:45:45 01/20/03
Go up one level in this thread
On January 19, 2003 at 14:01:56, Matt Taylor wrote: >Interesting to note that several of those routines rely on technically undefined >behavior. Under the bsf instruction, the manual states that, "...If the contents >of the source operand are 0, the contents of the destination operand are >undefined." Conveniently it seems that this works on all existing >implementations. > >A similar trick can be used with shifts. Integer shift instructions mask their >shift count to avoid unnecessary work. As a result, shifting by 32 does not >change the destination operand. > >I probably won't optimize your code for Pentium 4. I was rather annoyed when >some code I wrote executed about as fast on my Pentium 90 as it would on a >high-end Pentium 4. All the old tricks are now expensive. Shifting is 4 clocks >latency. The full adder (adc/sbb) is 2-3 clocks -throughput-. Latency is 6-8 >clocks. The setcc instruction is 5 clocks latency. Every one of these >instructions has a latency of 1 on Athlon and the original Pentium. They all >execute with a throughput of up to 3 instructions per cycle (1/3) on Athlon and >2 instructions per cycle (1/2) on original Pentium. Sigh. > >I'll optimize it for Athlon since I am now most familiar with its rules, and I >have tools to analyze the code. Taking a look now... > >-Matt Hi Matt, one question to your slightly modified BitBoard(1)<<sq code, you posted recently: BitBoard getSquareBB(int sq) { _asm { mov ecx, [sq] ; i want to skip this one mov edx, 1 xor eax, eax shl edx, cl test cl, 32 mov ecx, eax cmovz eax, edx cmovz edx, ecx } } This works fine so far with MSC6.0. But if i try to use __fastcall, to force parameter passing via register (first is ecx by convention which would be rather fine here), the following function succs in release mode. __forceinline BitBoard __fastcall getSquareBB(int sq) { _asm { // mov ecx, [sq] ; i want to skip this one // oups but not the right value in ecx generally mov edx, 1 xor eax, eax shl edx, cl test cl, 32 mov ecx, eax cmovz eax, edx cmovz edx, ecx } } I found no way so far, to force the compiler with inlined asm-routines, to pass a parameter via ecx-register. Same for the asm bsf-routines and others. I always have not necessary store/loads in the begginning of those functions. 0040109F 89 74 24 10 mov dword ptr [esp+10h],esi 004010A3 8B 4C 24 10 mov ecx,dword ptr [esp+10h] // instead of mov ecx, esi 004010A7 BA 01 00 00 00 mov edx,1 004010AC 33 C0 xor eax,eax 004010AE D3 E2 shl edx,cl 004010B0 F6 C1 20 test cl,20h 004010B3 8B C8 mov ecx,eax 004010B5 0F 44 C2 cmove eax,edx 004010B8 0F 44 D1 cmove edx,ecx Any hint obout this? Thanks in advance, Gerd
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.