Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Saved Another Cycle -- Woohoo!

Author: Matt Taylor

Date: 01:03:57 01/07/03

Go up one level in this thread


On January 07, 2003 at 01:29:12, Walter Faxon wrote:

>Hi, Matt.
>
>I haven't been able to completely follow your code (yet!), but there does seem
>to be one tiny bug, here marked by "<==".
>
>Also, glad you could use some of my "yabs()" logic -- if that's where you got
>it! :)
>
>-- Walter
>
>
>On January 06, 2003 at 13:12:24, Matt Taylor wrote:
>
><snip>
>
>>    ; Note, instructions dispatching in the same cycle are grouped.
>>    ; Note preservation of ebx/esi.
>>    ; Note: this routine tailored to Crafty's bit ordering!
>>
>>    push       esi
>>    mov        esi, DWORD PTR [bb]
>>    xor        eax, eax
>>
>>    mov        ecx, DWORD PTR [bb+4]
>>    xor        ebx, ebx               ; <== ebx zeroed...
>>    test       esi, esi
>>
>>    ; 2 cycles
>>    cmovz      esi, ecx
>>    push       ebx                    ; <== ...prior its preservation
>>    setz       al
>>    mov        ecx, esi
>>    neg        esi
>
><snip remaining code, etc.>
>
>>-Matt

Oops! Good point. I tested to see that it reproduced the same answer, but I
didn't check register preservation. A little rearranging should fix that. The
code that I'm working with doesn't suffer that deficiency; that was just an
error I made when revising it (quickly) for Crafty.

However, it has come to my attention that the table-based version, though far
less efficient, is much faster. It is some ~6 cycles faster, in fact. It is even
faster than the bsf instruction. Since Hammer's bsf can only get slower (from
what I read), it seems this is a win over bsf on any AMD chip.

I am a bit annoyed. I thought I had reduced the cycle count to 13, but then I
made the discovery (1) that there is some Athlon stall rule that I am not
familiar with and (2) that I was flipping index bit 4 with no means to flip it
back. Reading Eugene's code earlier reminded me of the tricks that the carry
flag can be used for. It can only be applied to index bit 4 because I can't
figure out a way to get the carry flag set on all the other comparisons. Anyway,
I have to compliment the carry flag to get a correct i5, and it brings the cycle
count to 14. Two other stalls that I'll research tomorrow also appear...sigh.

One other comment on the byte registers with setcc -- I had forgotten about the
high byte registers until one of your posts reminded me of them. However, using
them incurs penalties, and it's not worth trying. I have not yet tried writing
your other code. I think it can be implemented more efficiently using only
32-bit arithmetic and cmovcc.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.