Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to reverse a Reverse BitBoard! {just for fun - more bit reverse}

Author: Matt Taylor

Date: 23:01:26 01/23/03

Go up one level in this thread


It was a nice idea. It works well in theory, but it doesn't work quite as well
as other methods in practice for two reasons: indexing and memory latency.

It is generally best on a machine to use word-size indices to access memory to
avoid this problem. To generate the pointer to load from, the machine has to
mask (movzx in the case of x86 -- to preserve original value), add, and then
load. However, it is also not possible to fit a table for a word-sized index
into memory.

The other problem is memory latency. Assuming only cache hits to the L1 cache,
this is 2-3 cycles of latency on the top-of-the-line chips. The memory accesses
are independent and can be overlapped -- only as long as there is no dependency.
This means that each byte flip is going to eat at least 3 cycles. 3x4 (assuming
perfect overlapping between the two 64-bit halves) is 12 cycles + setup latency.

The logarithmic method I implemented and posted in assembly is 20 clocks, and
I'm quite sure the time can be reduced at least to 16 clocks or so on Athlon.
Practically, this is about the same as your table index routine, but it doesn't
use extra memory. Making use of 2 bswaps to reverse the bytes, you only need a
register swap + 3 iterations of the logarithmic reverse (to reverse each byte).
The register swap disappears on x86-64, and more parallelism might be extracted
to make it even faster. (Trickey -should- allow the clever x86 assembly
programmer to get the swap for free!)

If anyone is -really- interested, I'll optimize the bit reverse function
further. For Athlon, of course.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.