Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Population of disjoint Attacksets

Author: Dieter Buerssner

Date: 12:28:33 06/01/04

Go up one level in this thread


On May 31, 2004 at 10:07:42, Gerd Isenberg wrote:

>I tried a MMX-version based on the "dead slow" popcount of amd's optimization
>manual, with the eight add,major pairs. Even that takes about 42ns -> 2.1
>ns/32-bit. To get an idea what is possible with AMD64 gp registers!

Hi Gerd,
you won :-) I get 2.4 ns on my P4 2.53 GHz with your code. I cannot beat this
with my means: without assembly - even with my assembly knowledge, that predates
MMX instructions, this would probably be impossible to beat. I actually think,
the compilers produce pretty good assembly from my C-code (in this case)
already. I guess, coding my routine with MMX instructions would have a chance.
One has to "double" the masks, and use 64-bit registers, and add one stage
(which should not cost much). The first stages would be done 3 times on 3 64-bit
words, instead of 6 times, then, and one "odd" 64-bit word. Perhaps I am going
to try it, following your code. Also, on real 64-bit environments, I think my
idea should almost yield in double speed (but not totally doubled, because of
the one added round in the algorithm) compared to the 32-bit algorithm. Your
original code with maj/odd should at least double speed, however.

BTW. It was not without pitfalls, to try your code. This was the first time, I
tried MMX inline. In my timing prog, the outputs first were wrong. This was,
because I used floating point for times, and this did not mix with the mmx
instructions. So, I had to find out, to add _mm_empty() at the right place. I
used the free VC command line tools, for the tests, now (but times for other
functions discussed, did not really change to VC6).

Cheers,
Dieter




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.