Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: C (compiler) question

Author: Gerd Isenberg

Date: 12:49:36 09/16/04

Go up one level in this thread


On September 16, 2004 at 13:53:12, Dann Corbit wrote:

>On September 16, 2004 at 13:18:48, Russell Reagan wrote:
>
>>On September 16, 2004 at 03:12:33, Tony Werten wrote:
>>
>>>Yes. I needed a rewrite anyway, and Borland doesn't seem willing to produce a
>>>64bit compiler in the near future, wich is a big disadvantage since I wanted to
>>>use the Kogge-Stone stuff rather than the 0x88 I used until now. ( actually, it
>>>will be a kind of a mixture)
>>
>>How has the Kogge-Stone stuff been working for you? I was never able to get it
>>to work efficiently enough (rotated bitboards were at least 2x faster). Of
>>course, I didn't write MMX assembly like Gerd, so obviously it won't be as fast
>>as his approach.
>
>I have found that assembly language can impede the ability of the optimizer.  So
>a routine in assembly that will bench twice as fast in a simple test harness
>will not cause any discernable difference in a large program, or even slow it
>down.

That may happen, specially with small inlined msc assembly with fixed register
allocation, where parameters are pushed on stack even if already in a register.
GCC assembly seems smarter here. Probably one reason why ms doesn't support
inline assembly any longer but intrinsics under compiler/optimizer's control.

For Kogge-Stone or dumb7 fill routines it is necessary to use 64-bit register
files. At least if you want to process several directions or generators
simultaniously. If one don't use x87 floating point, the eight 64-bit mmx
registers are quite nice ressource to do such fill stuff on x86-32.

The mmx-intrinsics with msc6 are not that smart and suffer from a lot from
unneccesary load/stores. Using up to eight 64-mmx registers with one or two
gp-registers to address source or target structures, msc inline assembly clearly
outperforms mmx-intrinsics.

>
>So assembly always needs to be benchmarked in the place you intend to use it.

Yes. If you test rotated attack getters with lookup tables versus extensive
stallfree register processing, the first is even faster in testloops ;-)




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.