Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: a question about speed difference that I do not understand

Author: Ed Schröder
Date: 03:27:00 12/06/01
On December 05, 2001 at 15:21:37, Dieter Buerssner wrote:

>On December 05, 2001 at 14:45:54, Ed Schröder wrote:
>
>>On December 05, 2001 at 14:11:34, Dieter Buerssner wrote:
>>
>>>On December 05, 2001 at 13:55:19, Ed Schröder wrote:
>>>
>>>>Well, I never leave it up to the compiler to decide things for me but that's
>>>>easy becausese my background is assembler and my way of programming in C++
>>>>still is as close to assembler as possible. In your case I would program it
>>>>as follows:
>>>>
>>>>static char direction [65*65];       // create an area big enough, using char
>>>
>>>For zero based squares, this data structure is too big. 64*
>>>64 is enough. Also, becuase 64^2 is a power of two, it may give better >alignment for following data.
>>
>>See my other posting about the why taking somewhat overhead space. It is
>>harmless for the speed of a program, allignment has nothing to do with
>>it, I just waste some memory space to avoid bugs.
>>
>>
>>
>>>>To access the table:
>>>>
>>>>int x,y, char result;
>>>>
>>>>result = direction [x<<6][y];
>>>
>>>This is an syntax error. You probably mean direction[x<<6+y].
>>
>>You are right, a bug :)
>>
>>
>>>>That is all. Any good compiler will produce fast code, something like:
>>>>
>>>>mov    EAX,x
>>>>mov    EBX,y
>>>>shl    EAX,6
>>>>add    EAX,EBX
>>>>mov    CL,direction[EAX]
>>>>mov    result,CL
>
>[snipped]
>
>>I can not share your opinion although your point about the lea instruction is
>>true and my above listed assembler code surely can be improved.
>>
>>The net gain to move C to ASM is still 30% if not more. I have checked MSVC 6
>>and the ASM code it generates is to cry about.
>
>I won't argue with this. I have functions (not for chess, for high precision
>arithmetics), that are a factor of 10 faster in assembler (mainly because there
>is no access to the carry bit from high level laguage).
>
>However, I stay with my suggestion about the low level tricks. I just tried
>exactly this example with the 2 dimensional array and with the one dimensional
>array and the shift like you suggested. I expected no difference.
>
>static unsigned direction[64][64];
>
>unsigned foo(int i, int j)
>{
>  return direction[i][j];
>}
>
>static unsigned dir2[64*64];
>
>unsigned bar(int i, int j)
>{
>  return dir2[(i<<6)+j];
>}
>
>Assembler output:
>
>_foo    PROC NEAR
>; File foo.c
>; Line 4
>        push    ebp
>        mov     ebp, esp
>; Line 5
>        mov     eax, DWORD PTR _i$[ebp]
>        shl     eax, 8
>        mov     ecx, DWORD PTR _j$[ebp]
>        mov     eax, DWORD PTR _direction[eax+ecx*4]
>; Line 6
>        pop     ebp
>        ret     0
>_foo    ENDP
>_TEXT   ENDS
>PUBLIC  _bar
>_BSS    SEGMENT
>_dir2   DD      01000H DUP (?)
>_BSS    ENDS
>_TEXT   SEGMENT
>_i$ = 8
>_j$ = 12
>_bar    PROC NEAR
>; Line 11
>        push    ebp
>        mov     ebp, esp
>; Line 12
>        mov     eax, DWORD PTR _i$[ebp]
>        shl     eax, 6
>        add     eax, DWORD PTR _j$[ebp]
>        mov     eax, DWORD PTR _dir2[eax*4]
>; Line 13
>        pop     ebp
>        ret     0
>_bar    ENDP
>
>So, would you still think, the shift trick is worthwhile? When i and j are
>already in registers, the more straightforward aproach may even be faster.
>
>; res = direction[i][j]
>        mov     edx, eax ; i is in eax
>        shl     edx, 8
>        mov     edx, DWORD PTR _direction[edx+ecx*4] ; assume j is in ecx;
>                                                     ; result goes to edx;
>
>
>I can at least see no way, that it would be slower here.
>
>BTW. Similar results, when changing the type to unsigned char.

You are right, there is no speed difference in the 2 approaches, my compiler
generates similar code.

2 remarks, if you change the table size to a value not equal to a power of 2
you suddenly will see an expensive IMUL instruction appear in the ASM code.
Secondly the fact my chosen example was not a very good one to make a point
does not mean the low-level approach is inferior to your way.

A hopeful better example of a low-level approach in C++. Suppose you want
to generate moves for the white pieces and your piece-type definition is as
follows:

White_pawn = 1
White_knight = 2
White_bishop = 4
White_rook = 8
White_queen = 16
White_king =32

The traditional way:

Declarations...

void WP (void);          // rouitine that creates moves for white pawns
void WN (void);          // rouitine that creates moves for white knights
void WB (void);
void WR (void);
void WQ (void);
void WK (void);

main ()

{       int piece;

        switch(piece) { case  1 : WP();
                        case  2 : WN();
                        case  4 : WB();
                        case  8 : WR();
                        case 16 : WQ();
                        case 32 : WK(); }
}

The low-level approach, define the calls into a table and make use of the
processors possibility of an indirect call.

Declarations...

void WP (void);          // rouitine that creates moves for white pawns
void WN (void);          // rouitine that creates moves for white knights
void WB (void);
void WR (void);
void WQ (void);
void WK (void);
void error (void);

void    (*jump[])() =   { error,WP,WN,error,WB,error,error,error,
                          WR,error,error,error,error,error,error,error,
                          WQ,error,error,error,error,error,error,error,
                          error,error,error,error,error,error,error,error,
                          WK };
main ()

{       int piece;

        (jump[piece])();

}

One instruction it needs.

Ed



>Regards,
>Dieter
Re: a question about speed difference that I do not understand Gerd Isenberg 07:20:37 12/06/01
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.