Computer Chess Club Archives


Search

Terms

Messages

Subject: Speedups for BitBoard programs on 64-bit machines

Author: Gian-Carlo Pascutto

Date: 13:18:55 06/04/02


(This discussion started on the chess-engines list and got a bit too big,
so I move it over)

I started here:

----- Original Message -----
From: "Robert M. Hyatt" <hyatt@cis.uab.edu>
To: <chess-engines@yahoogroups.com>
Sent: Tuesday, June 04, 2002 6:48 PM
Subject: RE: [chess-engines] Moves in first five ply


> On Tue, 4 Jun 2002, Vincent Diepeveen wrote:
>
> > My datastructure and the one from Yace is a factor 2 faster
> > than yours Bob at 32 bits processors.
> >
> > At 64 bits processors you win 33% or so in speed (see
> > specbench for alpha processors. My guess that's however
> > not the 64 bits so much as well as also partly because
> > of doing 4 instructions a clock versus 3.
> >
> > So still a lot slower.
>
>
> If you believe that, its ok by me.
>
> I don't...

Isn't it possible to more or less objectively figure this out?
We _have_ bechmarks of Crafty on Alpha and Intel/AMD.

Best Alpha in SPEC2000:

Compaq Computer Corp AlphaServer ES45 Model 68/1000
Alpha 21264C 1000Mhz

Crafty ratio: 816      Average SPECINT over all apps: 679

Closest AMD matching machine in average SPECINT speed:

Advanced Micro Devices Asus A7M266-D Motherboard
AMD Athlon (TM) MP 2000+

Crafty ratio: 967      Average SPECINT over all apps: 662

If I interpret this correctly, this implies that for two
machines which are within 2% in overall CPU speed, the
64-bits machine is SLOWER in running Crafty than the 32-bits
one. Hmm.

--
GCP


And last posts were

On Tue, 4 Jun 2002, Gian-Carlo Pascutto wrote:

> ----- Original Message -----
> From: "Dann Corbit" <dcorbit@connx.com>
> To: <chess-engines@yahoogroups.com>
> Sent: Tuesday, June 04, 2002 8:20 PM
> Subject: RE: [chess-engines] Moves in first five ply
>
>
> > A profile of crafty shows that it is very even handed.  In other words,
> > there are no striking bottlenecks.  In most programs, you can easily
> > find some place and say that if you could speed up one or two places you
> > would get a big boost.
> >
> > I suspect that what happens is that the bitboard stuff ceases being a
> > bottleneck and something else becomes a more dominant speed trap.  Of
> > course, my speculation is even wilder because I have never profiled
> > crafty on a 64 bit machine (despite that fact that we have many
> > different kinds here).
>
> Considering Crafty basically consists of no more than fiddling around
> with bitboards, I agree with Vincent that it would be RAM latency.
>
> That's a pretty nasty one, considering CPU's have been getting faster
> faster than RAM has. Your hope would be that someday the internal
> caches get big enough to contain all essential bitboard data.
>
>

Already done.  _none_ of crafty's data arrays used for updating the
bitmaps adds up to a megabyte.  IE the rotated lookup tables are
64 * 256 (=16K) entries of 8 bytes each.  There are four of them.

The remainder are much smaller and the entire kit and kaboodle fits
in under 1 meg.  Memory bandwidth is not a problem with any reasonable
size of L2 cache.  My xeons are 1M and work just fine.





--
Robert Hyatt                    Computer and Information Sciences
hyatt@cis.uab.edu               University of Alabama at Birmingham
(205) 934-2213                  115A Campbell Hall, UAB Station
(205) 934-5473 FAX              Birmingham, AL 35294-1170

----- Original Message -----
From: "Robert M. Hyatt" <hyatt@cis.uab.edu>
To: <chess-engines@yahoogroups.com>
Sent: Tuesday, June 04, 2002 9:54 PM
Subject: Re: [chess-engines] Moves in first five ply



> I don't follow the logic.  For applications that are designed
> around 32 bits, you "normalize" the two processors.  Even though
> in the case of Crafty, we have an application that is _designed_
> around 64 bits?
>
> mhz for mhz, 32 vs 64 is meaningless for applications that are
> designed around 32 bits.  There is no benefit.  But for applications
> that use the extra data density of 64 bit words, the advantage can
> be significant.

But then why are we not seeing a relative speedup on the Alpha?

Crafty is optimized for 64 bits, yet when running on a 64 bits
Alpha, it gets relatively SLOWER than the other apps, which are
presumably not optimized for 64 bits.

By your reasoning, Crafty should have a better relative performance
than the average application, on the Alpha when compared to the x86
machines, but we see exactly the reverse in SPEC results.

--
GCP

----- Original Message -----
From: "Robert M. Hyatt" <hyatt@cis.uab.edu>
To: <chess-engines@yahoogroups.com>
Sent: Tuesday, June 04, 2002 9:51 PM
Subject: Re: [chess-engines] Moves in first five ply


> How would you feel about a 1.5M nps crafty on a new IA64 at
> 1.0 ghz.
>
> Not bad when no 32 bit processor can come even close to that at
> more than double the clock speed...

As I stated before, the raw performance of the machine is
meaningless.

If you'd put a 0x88 program on, it might very well also
skyrocket in NPS.

(Nice NPS anyway, IIRC the first generation IA64 was just
pathetic with Crafty)

--
GCP

On Tue, 4 Jun 2002, Gian-Carlo Pascutto wrote:

> ----- Original Message -----
> From: "Dann Corbit" <dcorbit@connx.com>
> To: <chess-engines@yahoogroups.com>
> Sent: Tuesday, June 04, 2002 7:44 PM
> Subject: RE: [chess-engines] Moves in first five ply
>
>
> > At some point, 64 bit CPU's are going to clobber the 32 bit ones [the 8
> > bit story and 16 bit story are sure to repeat themselves].  Right now,
> > 64 bit chips are running at half the clock rate of the 32 bit chips.  I
> > don't imagine that they are going to suddenly release CPU's that have 4x
> > the performance because it would kill off their current inventory of 32
> > bit stuff.  I suspect we will see a slow dribble of technology updates
> > (like always) unless something pushes them.
>
> The issue isn't whether 64-bit CPU's will get faster than 32-bit ones.
> They will. The issue is whether (and how big) the benefit for a
> (bitboard) chessprogram from being able to work with 64-bit quantities
> is.
>
> The 32-bit CPU's didn't clobber the 16-bit ones so much because of
> the possibility of using 32-bit data quantities over 16-bit quantities,
> but because they were more advanced and faster designs.
>
> We're not debating whether the CPU's are faster or not - we're debating
> whether the 32-bit to 64-bit transition in itself has significant
> performance advantages.
>
> The original comparison pointed out that for two processors that are
> of similar speed over a wide variety of benchmarks, the 64-bit one does
> not have an advantage over the 32-bit one as far as Crafty goes.

Here I simply disagree.  "which" 32 bit processor will you suggest that
can do 1.5M nps with crafty?  Intel has a 64 bitter at 1.0ghz that will
do this.  And it will clock a good bit higher than 1.0ghz also, to
further "spread the divide"...



>
> For me, that is a surprising result. It seems that Crafty does not
> get faster when the bitboards fit in registers and can be manipulated
> by single instructions.
>

Because you are using a processor that is clocked at twice the clock
frequency?  Why compare a 1ghz processor to a (nearly) 2ghz processor
and conclude anything about efficiency there?  Is there anything that
suggests that the alpha is simply more "efficient"?  To justify that
clock frequency disparity?

A machine twice as fast (clock freq) _should_ perform just as well as
a 64 bit machine at 1/2 the frequency...  Less would suggest that the
32 bit machine simply sucks badly.

Mckinley certainly bears attention.  It is producing some amazing
numbers.



> A possible explanation might be that, right now, the bitboards allow
> the processors to take full advantage of their superscalar design by
> always keeping a high amount of pipelines busy. When the bitboards
> can be manipulated by single instructions, you suddenly can't issue
> multiple isns per clock any more because of data dependencies. Your
> secondary and ternary pipelines stay empty and you run no faster.
> (The above paragraph is wild speculation)
>


That is the reason bitboards work well at all on 32 bit machines.  It
is difficult to keep 2 or more scalar pipes busy.  pairs of 32 bit
operations make this easier...  and use cycles that would be otherwise
lost...



>

--
Robert Hyatt                    Computer and Information Sciences
hyatt@cis.uab.edu               University of Alabama at Birmingham
(205) 934-2213                  115A Campbell Hall, UAB Station
(205) 934-5473 FAX              Birmingham, AL 35294-1170





This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.