Author: Gerd Isenberg
Date: 08:41:00 12/21/05
Go up one level in this thread
On December 21, 2005 at 09:51:59, Daniel Mehrmannn wrote:
>Hello,
>
>i was starting to play with 64Bit coding weeks ago and could collect some
>experince of my engine Homer. I don't know if it works for you or if it helps
>you.
>
>First i was reading a lot of stuff in the web about 64Bit coding and made my own
>tries. I think the keypoint to become a high performance 64Bit programming are
>the integer registers and how we should use this with a effective utilization.
>
>I have a lot of tables and tables-lookup in my engine. My idea is here to go in
>direction 64Bit. In the example code i have tables where i performed a bit AND
>operation.
>
>Standard 32Bit:
>
>int a1[2048];
>int a2[2048];
>int a3[2048];
>
>for (int i = 0; i < 2048; ++i)
> a3[i] = a1[i] & a2[i];
>
>Changing here int to long long (unsigned __int64) brings a lot of speedup in
>64Bit:
>
>pure 64Bit:
>
>long long a1[1024];
>long long a2[1024];
>long long a3[1024];
>
>for (int i = 0; i < 1024; ++i)
> a3[i] = a1[i] & a2[i];
>
>I don't changes the total size of the bit set block, mostly i don't need so much
>space but here is the speed more importend ;)
Hi Daniel,
looping on the half time with 64-bit with - doing a kind of simd - does of
course gain a lot. You may even win some more cycles if you make your index i
unsigned and do some unrolling:
long long a1[1024];
long long a2[1024];
long long a3[1024];
for (unsigned int i = 0; i < 1024; i+=4) {
a3[i+0] = a1[i+0] & a2[i+0];
a3[i+1] = a1[i+1] & a2[i+1];
a3[i+2] = a1[i+2] & a2[i+2];
a3[i+3] = a1[i+3] & a2[i+3];
}
But you changed you basic type from int to long long.
So your former
int_a[evenIndex] becomes (int)ll_a[evenIndex/2]; and
int_a[oddIndex] becomes (int)(ll_a[evenIndex/2]>>32);
So you have to change some places in your source.
I suggest to make a anomymious union, or to leave the old int declaration
forcing 64-bit alignment and to cast or alias int* to long long* in your
"and"-routine.
Of course such a loop is also target of intel's SSE2-vectorizer.
Does gcc has sse2-intrinsics?
Gerd
>
>
>I think there is allso a way for bit-counting stuff. But i'm testing currently
>;)
>
>Best,
>Daniel
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.