Author: Sander de Zoete
Date: 10:26:13 01/24/03
Go up one level in this thread
On January 24, 2003 at 02:01:26, Matt Taylor wrote: >It was a nice idea. It works well in theory, but it doesn't work quite as well >as other methods in practice for two reasons: indexing and memory latency. > >It is generally best on a machine to use word-size indices to access memory to >avoid this problem. To generate the pointer to load from, the machine has to >mask (movzx in the case of x86 -- to preserve original value), add, and then >load. However, it is also not possible to fit a table for a word-sized index >into memory. > >The other problem is memory latency. Assuming only cache hits to the L1 cache, >this is 2-3 cycles of latency on the top-of-the-line chips. The memory accesses >are independent and can be overlapped -- only as long as there is no dependency. >This means that each byte flip is going to eat at least 3 cycles. 3x4 (assuming >perfect overlapping between the two 64-bit halves) is 12 cycles + setup latency. > >The logarithmic method I implemented and posted in assembly is 20 clocks, and >I'm quite sure the time can be reduced at least to 16 clocks or so on Athlon. >Practically, this is about the same as your table index routine, but it doesn't >use extra memory. Making use of 2 bswaps to reverse the bytes, you only need a >register swap + 3 iterations of the logarithmic reverse (to reverse each byte). >The register swap disappears on x86-64, and more parallelism might be extracted >to make it even faster. (Trickey -should- allow the clever x86 assembly >programmer to get the swap for free!) > >If anyone is -really- interested, I'll optimize the bit reverse function >further. For Athlon, of course. > >-Matt Hi Matt, I will be very interested ofcourse. The only thing is, that I'm using a pentium 4. But anyhow, I'm interested in the code. So if you finish it, could you please send me a copy. s.de.zoete@tiscali.nl I would be very grateful. I will use it in my reversed bitboard based program Preatorian (not finished yet). Sander.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.