Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Speedups for BitBoard programs on 64-bit machines

Author: Robert Hyatt
Date: 17:33:02 06/06/02
On June 06, 2002 at 19:54:24, Vincent Diepeveen wrote:

>On June 06, 2002 at 10:24:27, Robert Hyatt wrote:
>
>There is a huge difference between the test on this
>processor, because running at 2 processors it was very
>slow from hardware viewpoint. Like 1.5 was mentionned
>then too.

OK...  so what?  It was very fast on the single-cpu test...


>
>Also we talk about a very old version of crafty here compared
>to the crafty that's existing today. I remember that you had
>way less king safety and some other scans in these crafties
>and you did do less here and there.


it was the last 16.x version I believe.  I am now on 18.15, but
king safety hasn't been greatly modified during that span...




>
>In short about all was allowed to get more nps, whereas
>right now the 'default' assembly used for K7/P4 is fucking
>slow beginners assembly. This was of course not put to
>'slow' at this alpha test, as there were no 'specint'
>limits.

I don't know what you mean.  I know for 100% certainty that Tim didn't
modify the source code.  He was running gnuchess on ICC one night and we
noticed an impossible NPS.  I asked if he would try crafty and he said
sure.  I sent him the source, and a we had benchmark numbers about 10
minutes later.  He then ran WAC (one minute/pos) and sent me the results
which I include here:

1 cpu  21264/600mhz:
total positions searched..........         300
number right......................         300
number wrong......................           0
percentage right..................         100
percentage wrong..................           0
total nodes searched.............. 236973211.0
average search depth..............         4.5
nodes per second..................      783641

4 cpus  quad xeon 550:

total positions searched..........         300
number right......................         299
number wrong......................           1
percentage right..................          99
percentage wrong..................           0
total nodes searched.............. 280348143.0
average search depth..............         4.5
nodes per second..................      722788

2 cpus, 21264/600mhz:

total positions searched..........         300
number right......................         300
number wrong......................           0
percentage right..................         100
percentage wrong..................           0
total nodes searched.............. 330905102.0
average search depth..............         4.5
nodes per second..................     1266767

Not bad.  I had remembered 1M and 1.5M.  I just verified that those numbers
were produced on a 667mhz machine instead, at Compaq.  A slightly faster version
of Tim's machine.  And right in line with the 1.5M single-cpu speed of Mckinley
at 1ghz.





>
>It was *not* a production alpha ever, the test was done long
>before this type of alpha was put on the market, so we don't
>know whether you can buy this alpha in the shop.

I have no idea what you are talking about.  I had exactly that machine
here in my lab, for 6+ months.  (single-cpu version).  It ran at 667 mhz
and produced 1M nodes per second.  I didn't do much with chess on it as it
was here to do some work for someone up the street from here.  But it was
(and is) available for purchase.

I had that machine over a year ago.  It was not a "black box" but had a name
plate on the front and could be ordered from whomever owned the DEC stuff
at that point in time.

Someone up in the medical school bought the thing, left it here for me to
work on some code for him, and that was that...



>
>There is another list of things wrong.
>
>For example if it was such a slow processor, why only getting
>1.5 hardware speedup out of 2 processors?

Because the hash table used locks.  And the locks were very bad on the
alpha.  We later went to the "lockless hash table" that I now use.  I
never had access to either machine (Tim's or the one in the medical
school here) to run WAC again after that was fixed.  The out-of-order
memory writes on the alpha require a "barrier" prior to clearing the
lock, and the lock/unlock themselves are also very expensive.  Both
together (lock/barrier) really produced a bottleneck.  No mystery at
all...

I think we mentioned this in the paper we wrote for ICCA which ought to
appear in the next issue.



>
>That means a cheap dual K7 getting 2 million nodes a second is still
>faster than this 1.5 million nodes a second dual alpha.

I have not yet seen a dual K7 get 2M nodes per second with Crafty...


>
>Note that we compare a todays crafty version with that special
>old thing then. Also we assume then beginners assembly for the
>current dual K7 crafty, versus optimal defines for the alpha.

The version Tim had was not that old.  The version I ran on the 667 mhz
machine was even newer, in the 17.x group...

>
>That's not a very fair compare.

Seems perfectly fair to me...
Re: Speedups for BitBoard programs on 64-bit machines Vincent Diepeveen 17:59:49 06/06/02
- Re: Speedups for BitBoard programs on 64-bit machines Robert Hyatt 21:25:30 06/07/02
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.