Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: <slightly OT> Optimized Instruction Scheduling

Author: Rafael Andrist

Date: 11:46:47 11/16/01

On November 15, 2001 at 11:36:55, Sven Reichard wrote:

>Although I always favored a clear structure over high speed, lately I
>experimented with bitboards. I came across a pipeline bottleneck that I would
>imagine today's compilers to avoid. However this doesn't seem to be the case, at
>least not for gcc.
>
>Processor: K6-2
>Compiler: g++ -O6
>(tried also -march=k6 and -march=pentium)
>The following routine puts a piece on a square, updating the bitboards. It also
>keeps track of first order evaluation. I have a global variable (actually static
>to the class)
>
>signed short values[MaxPiece][64];
>
>The straighforward implementation looked like
>
>setSquare(char sq, Piece p)
>{
>	// <snip> manipulate bitboards
>	// and then
>	material += values[sq][p];
>}
>
>Changing that to
>
>setSquare(char sq, Piece p)
>{
>	signed short value = values[sq][p];
>	// <snip> manipulate bitboards
>	material += value;
>}
>gave a (more or less substantial) increase in speed.
>(The reason, I suspect, is that values is not found in the cache, hence it has
>to be fetched from memory, which takes a couple of cycles. We can use these
>cycles to do other work.)
>Has anybody had similar experience, or are there compilers out there that do
>this kind of optimization automatically? Or is there some reason that this can't
>be done?

If you read a value from memory (or from cache) in a CPU register, you can't use
it immediately, else you get punished by a so called AGI stall. If I write C
code I usually don't care about it, I only do if I write in Assembler. Good
compilers should generate a code with few as possible AGI stalls.

Rafael B. Andrist

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.