Computer Chess Club Archives


Search

Terms

Messages

Subject: Comparing "speeds" of hardware versus software

Author: Steffan Westcott

Date: 07:03:55 02/14/04

Go up one level in this thread


On February 14, 2004 at 00:24:33, Luis Smith wrote:

>Do you know what speed Brutus/Hydra runs on one of those FPGA
>cards compared to the 3.06 chip?  I imagine the FPGA chip would be signifigantly
>faster than a Xeon processor.

Comparing speeds of an FPGA (or ASIC) application (eg chess) to a CPU running
application software is not straightforward at all. Both contain the concept of
a hardware clock (or clocks), but it is meaningless to compare clock speeds, as
this does not measure the amount of useful work done per clock cycle
(Incidentally, FPGA applications usually run at a clock frequency of around
50MHz - 100MHz or so, but this is a gross generalisation).

An area where this difference is most obvious is chess position evaluation.

A CPU chess program would include some CPU instructions to evaluate properties
of a chess position, and most likely produce a score, among other results.

To 'add chess knowledge' to the CPU chess program, more detailed properties
about the chess position are sought, so more CPU instructions are added to the
program to evaluate them. These extra instructions mean more CPU clock cycles
are needed to perform a full position evaluation, on average.

A hardware (FPGA, ASIC) chess application could be implemented in many ways. The
evaluation portion satisfies the same requirement to evaluate properties of a
chess position, produce scores and other results and so on, but it is not
restricted to the CPU model of executing an instruction stream to achieve it.
One approach is to present the entire chess position as an input to a custom
logic function, which produces its results in one clock cycle. This is not the
most viable approach however, as the logic function would be extremely large,
complex and deep with a low maximum clock frequency. Another approach is to
pipeline the evaluation over 8 clock cycles, performing a file-wise sweep over
the chess position. This will reduce logic size, complexity and depth, and
increase maximum clock frequency, but also increase the number of clock cycles
needed (and perhaps latency).

To 'add chess knowledge' to either of the hardware based approaches, more logic
terms are added to the logic function. These extra logic terms mean greater
logic size, and perhaps a small impact on clock frequency, but no change in the
number of clock cycles needed to perform a full position evaluation.

Given the above explanation that comparing clock speeds between CPU and
FPGA/ASIC chess applications is not useful, it might be tempting to compare the
rate of chess position evaluations instead (aka "nodes per second").
Unfortunately, this too is not too helpful either, as chess programs in general
vary in the quantity and quality of chess evaluations they perform.

Cheers,
Steffan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.