Author: Ed Schröder
Date: 03:27:00 12/06/01
Go up one level in this thread
On December 05, 2001 at 15:21:37, Dieter Buerssner wrote:
>On December 05, 2001 at 14:45:54, Ed Schröder wrote:
>
>>On December 05, 2001 at 14:11:34, Dieter Buerssner wrote:
>>
>>>On December 05, 2001 at 13:55:19, Ed Schröder wrote:
>>>
>>>>Well, I never leave it up to the compiler to decide things for me but that's
>>>>easy becausese my background is assembler and my way of programming in C++
>>>>still is as close to assembler as possible. In your case I would program it
>>>>as follows:
>>>>
>>>>static char direction [65*65]; // create an area big enough, using char
>>>
>>>For zero based squares, this data structure is too big. 64*
>>>64 is enough. Also, becuase 64^2 is a power of two, it may give better >alignment for following data.
>>
>>See my other posting about the why taking somewhat overhead space. It is
>>harmless for the speed of a program, allignment has nothing to do with
>>it, I just waste some memory space to avoid bugs.
>>
>>
>>
>>>>To access the table:
>>>>
>>>>int x,y, char result;
>>>>
>>>>result = direction [x<<6][y];
>>>
>>>This is an syntax error. You probably mean direction[x<<6+y].
>>
>>You are right, a bug :)
>>
>>
>>>>That is all. Any good compiler will produce fast code, something like:
>>>>
>>>>mov EAX,x
>>>>mov EBX,y
>>>>shl EAX,6
>>>>add EAX,EBX
>>>>mov CL,direction[EAX]
>>>>mov result,CL
>
>[snipped]
>
>>I can not share your opinion although your point about the lea instruction is
>>true and my above listed assembler code surely can be improved.
>>
>>The net gain to move C to ASM is still 30% if not more. I have checked MSVC 6
>>and the ASM code it generates is to cry about.
>
>I won't argue with this. I have functions (not for chess, for high precision
>arithmetics), that are a factor of 10 faster in assembler (mainly because there
>is no access to the carry bit from high level laguage).
>
>However, I stay with my suggestion about the low level tricks. I just tried
>exactly this example with the 2 dimensional array and with the one dimensional
>array and the shift like you suggested. I expected no difference.
>
>static unsigned direction[64][64];
>
>unsigned foo(int i, int j)
>{
> return direction[i][j];
>}
>
>static unsigned dir2[64*64];
>
>unsigned bar(int i, int j)
>{
> return dir2[(i<<6)+j];
>}
>
>Assembler output:
>
>_foo PROC NEAR
>; File foo.c
>; Line 4
> push ebp
> mov ebp, esp
>; Line 5
> mov eax, DWORD PTR _i$[ebp]
> shl eax, 8
> mov ecx, DWORD PTR _j$[ebp]
> mov eax, DWORD PTR _direction[eax+ecx*4]
>; Line 6
> pop ebp
> ret 0
>_foo ENDP
>_TEXT ENDS
>PUBLIC _bar
>_BSS SEGMENT
>_dir2 DD 01000H DUP (?)
>_BSS ENDS
>_TEXT SEGMENT
>_i$ = 8
>_j$ = 12
>_bar PROC NEAR
>; Line 11
> push ebp
> mov ebp, esp
>; Line 12
> mov eax, DWORD PTR _i$[ebp]
> shl eax, 6
> add eax, DWORD PTR _j$[ebp]
> mov eax, DWORD PTR _dir2[eax*4]
>; Line 13
> pop ebp
> ret 0
>_bar ENDP
>
>So, would you still think, the shift trick is worthwhile? When i and j are
>already in registers, the more straightforward aproach may even be faster.
>
>; res = direction[i][j]
> mov edx, eax ; i is in eax
> shl edx, 8
> mov edx, DWORD PTR _direction[edx+ecx*4] ; assume j is in ecx;
> ; result goes to edx;
>
>
>I can at least see no way, that it would be slower here.
>
>BTW. Similar results, when changing the type to unsigned char.
You are right, there is no speed difference in the 2 approaches, my compiler
generates similar code.
2 remarks, if you change the table size to a value not equal to a power of 2
you suddenly will see an expensive IMUL instruction appear in the ASM code.
Secondly the fact my chosen example was not a very good one to make a point
does not mean the low-level approach is inferior to your way.
A hopeful better example of a low-level approach in C++. Suppose you want
to generate moves for the white pieces and your piece-type definition is as
follows:
White_pawn = 1
White_knight = 2
White_bishop = 4
White_rook = 8
White_queen = 16
White_king =32
The traditional way:
Declarations...
void WP (void); // rouitine that creates moves for white pawns
void WN (void); // rouitine that creates moves for white knights
void WB (void);
void WR (void);
void WQ (void);
void WK (void);
main ()
{ int piece;
switch(piece) { case 1 : WP();
case 2 : WN();
case 4 : WB();
case 8 : WR();
case 16 : WQ();
case 32 : WK(); }
}
The low-level approach, define the calls into a table and make use of the
processors possibility of an indirect call.
Declarations...
void WP (void); // rouitine that creates moves for white pawns
void WN (void); // rouitine that creates moves for white knights
void WB (void);
void WR (void);
void WQ (void);
void WK (void);
void error (void);
void (*jump[])() = { error,WP,WN,error,WB,error,error,error,
WR,error,error,error,error,error,error,error,
WQ,error,error,error,error,error,error,error,
error,error,error,error,error,error,error,error,
WK };
main ()
{ int piece;
(jump[piece])();
}
One instruction it needs.
Ed
>Regards,
>Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.