Author: Gerd Isenberg
Date: 15:55:00 10/14/02
Go up one level in this thread
On October 14, 2002 at 16:31:38, Russell Reagan wrote: >This is very interesting Gerd. > >Which approach do you think will work the fastest on 64-bit cpus? Rotated, MMX >non-rotated, or plain old non-rotated? A lot of people say rotated are fastest, >now there is this MMX that is at least on par with rotated bitboards, and I >remember reading a while back that some people were using non-rotated bitboards >and doing classic style board scanning to find attacks and such, and that this >old-non-rotated method worked fastest for them (maybe because of better cache >hits?). > >So which do you think will be the winner on 64-bit machines? We have all of this >talk about which is the fastest on 32-bit machines, but before too long 64-bit >will be here and all of the conclusions we've drawn might be turned upside down. >What if plain old non-rotated bitboards is the winner on 64-bit hardware? > >Any thoughts? > >Russell Hi Russell, Rotated are fast. Absolutely the shortest code, but access to rather huge lookup arrays!? I believe with 64-bit cpu's and currently with amd's athlon mmx-performance, floodfill (dumb7fill) or Kogge-Stone based algorithms will win. One benefit of these algorithms is to process up to eight independent directions simultaniously with several integer or mmx-integer pipes. But their main benefit is the ability to generate multiple attacks from one kind of piece and color. So it's very cheap to generate an attack-bitboard of all pieces of one color: getRookAttacks (wrooks|wqueens) |getBishopAttacks(wbishops|wqueens) |getKnightAttacks(wknights) |getKingAttacks (wking) |getWPawnAttacks (wpawns) It may be interesting to call the attack-getters multiple times, passing the return with some mask anding as a parameter again, looking for some pieces to reach some targets in n moves or captures. Instruction sequences like this are currently damned fast on athlon, it seems that it makes up to four in parallel, P4 needs almost the double time with this: // 2. diagonal fill in each direction psllq mm1, 9 ; rightup psrlq mm4, 7 ; rightdown psllq mm2, 7 ; leftup psrlq mm3, 9 ; leftdown por mm0, mm1 ; bishopAttacks |= rightup por mm0, mm4 ; bishopAttacks |= rightdown por mm0, mm2 ; bishopAttacks |= leftup por mm0, mm3 ; bishopAttacks |= leftdown pand mm1, mm5 ; clear rightup occupied or h file pand mm4, mm5 ; clear rightdown occupied or h file pand mm2, mm7 ; clear leftup occupied or a file pand mm3, mm7 ; clear leftdown occupied or a file Some current MMX-drawbacks, like expensive "movd" vector-path instructions to pass mmx to reg32 will disappear with hammer. Hammer will have 16! 128-bit XMM-registers, with SSE2 and 3Dnow instructions. You can generate two attack sets for one kind of piece simultaniously. Regards, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.