Author: Gerd Isenberg
Date: 08:41:00 12/21/05
Go up one level in this thread
On December 21, 2005 at 09:51:59, Daniel Mehrmannn wrote: >Hello, > >i was starting to play with 64Bit coding weeks ago and could collect some >experince of my engine Homer. I don't know if it works for you or if it helps >you. > >First i was reading a lot of stuff in the web about 64Bit coding and made my own >tries. I think the keypoint to become a high performance 64Bit programming are >the integer registers and how we should use this with a effective utilization. > >I have a lot of tables and tables-lookup in my engine. My idea is here to go in >direction 64Bit. In the example code i have tables where i performed a bit AND >operation. > >Standard 32Bit: > >int a1[2048]; >int a2[2048]; >int a3[2048]; > >for (int i = 0; i < 2048; ++i) > a3[i] = a1[i] & a2[i]; > >Changing here int to long long (unsigned __int64) brings a lot of speedup in >64Bit: > >pure 64Bit: > >long long a1[1024]; >long long a2[1024]; >long long a3[1024]; > >for (int i = 0; i < 1024; ++i) > a3[i] = a1[i] & a2[i]; > >I don't changes the total size of the bit set block, mostly i don't need so much >space but here is the speed more importend ;) Hi Daniel, looping on the half time with 64-bit with - doing a kind of simd - does of course gain a lot. You may even win some more cycles if you make your index i unsigned and do some unrolling: long long a1[1024]; long long a2[1024]; long long a3[1024]; for (unsigned int i = 0; i < 1024; i+=4) { a3[i+0] = a1[i+0] & a2[i+0]; a3[i+1] = a1[i+1] & a2[i+1]; a3[i+2] = a1[i+2] & a2[i+2]; a3[i+3] = a1[i+3] & a2[i+3]; } But you changed you basic type from int to long long. So your former int_a[evenIndex] becomes (int)ll_a[evenIndex/2]; and int_a[oddIndex] becomes (int)(ll_a[evenIndex/2]>>32); So you have to change some places in your source. I suggest to make a anomymious union, or to leave the old int declaration forcing 64-bit alignment and to cast or alias int* to long long* in your "and"-routine. Of course such a loop is also target of intel's SSE2-vectorizer. Does gcc has sse2-intrinsics? Gerd > > >I think there is allso a way for bit-counting stuff. But i'm testing currently >;) > >Best, >Daniel
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.