Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64Bit optimize coding - My experience (AMD64)

Author: Gerd Isenberg

Date: 08:41:00 12/21/05

Go up one level in this thread


On December 21, 2005 at 09:51:59, Daniel Mehrmannn wrote:

>Hello,
>
>i was starting to play with 64Bit coding weeks ago and could collect some
>experince of my engine Homer. I don't know if it works for you or if it helps
>you.
>
>First i was reading a lot of stuff in the web about 64Bit coding and made my own
>tries. I think the keypoint to become a high performance 64Bit programming are
>the integer registers and how we should use this with a effective utilization.
>
>I have a lot of tables and tables-lookup in my engine. My idea is here to go in
>direction 64Bit. In the example code i have tables where i performed a bit AND
>operation.
>
>Standard 32Bit:
>
>int   a1[2048];
>int   a2[2048];
>int   a3[2048];
>
>for (int i = 0; i < 2048; ++i)
>    a3[i] = a1[i] & a2[i];
>
>Changing here int to long long (unsigned __int64) brings a lot of speedup in
>64Bit:
>
>pure 64Bit:
>
>long long  a1[1024];
>long long  a2[1024];
>long long  a3[1024];
>
>for (int i = 0; i < 1024; ++i)
>     a3[i] = a1[i] & a2[i];
>
>I don't changes the total size of the bit set block, mostly i don't need so much
>space but here is the speed more importend ;)

Hi Daniel,

looping on the half time with 64-bit with - doing a kind of simd - does of
course gain a lot. You may even win some more cycles if you make your index i
unsigned and do some unrolling:

long long  a1[1024];
long long  a2[1024];
long long  a3[1024];

for (unsigned int i = 0; i < 1024; i+=4) {
     a3[i+0] = a1[i+0] & a2[i+0];
     a3[i+1] = a1[i+1] & a2[i+1];
     a3[i+2] = a1[i+2] & a2[i+2];
     a3[i+3] = a1[i+3] & a2[i+3];
}

But you changed you basic type from int to long long.

So your former
int_a[evenIndex] becomes (int)ll_a[evenIndex/2];  and
int_a[oddIndex]  becomes (int)(ll_a[evenIndex/2]>>32);

So you have to change some places in your source.

I suggest to make a anomymious union, or to leave the old int declaration
forcing 64-bit alignment and to cast or alias int* to long long* in your
"and"-routine.

Of course such a loop is also target of intel's SSE2-vectorizer.
Does gcc has sse2-intrinsics?

Gerd

>
>
>I think there is allso a way for bit-counting stuff. But i'm testing currently
>;)
>
>Best,
>Daniel



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.