Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Speedups for BitBoard programs on 64-bit machines

Author: Vincent Diepeveen

Date: 17:59:49 06/06/02

Go up one level in this thread


On June 06, 2002 at 20:33:02, Robert Hyatt wrote:

>On June 06, 2002 at 19:54:24, Vincent Diepeveen wrote:
>
>>On June 06, 2002 at 10:24:27, Robert Hyatt wrote:
>>
>>There is a huge difference between the test on this
>>processor, because running at 2 processors it was very
>>slow from hardware viewpoint. Like 1.5 was mentionned
>>then too.
>
>OK...  so what?  It was very fast on the single-cpu test...
>>
>>Also we talk about a very old version of crafty here compared
>>to the crafty that's existing today. I remember that you had
>>way less king safety and some other scans in these crafties
>>and you did do less here and there.
>
>
>it was the last 16.x version I believe.  I am now on 18.15, but
>king safety hasn't been greatly modified during that span...

You have a big number of functions called
  'evaluationBLABLA'

where blabla stands for a lot of simplistic things. The evaluation itself
eats just 10%, so obviously that other 30% you lose *somewhere* in these
functions.

We talk about crafty from a bunch of years ago now.

>
>
>
>>
>>In short about all was allowed to get more nps, whereas
>>right now the 'default' assembly used for K7/P4 is fucking
>>slow beginners assembly. This was of course not put to
>>'slow' at this alpha test, as there were no 'specint'
>>limits.
>
>I don't know what you mean.  I know for 100% certainty that Tim didn't
>modify the source code.  He was running gnuchess on ICC one night and we
>noticed an impossible NPS.  I asked if he would try crafty and he said
>sure.  I sent him the source, and a we had benchmark numbers about 10
>minutes later.  He then ran WAC (one minute/pos) and sent me the results
>which I include here:
>
>1 cpu  21264/600mhz:
>total positions searched..........         300
>number right......................         300
>number wrong......................           0
>percentage right..................         100
>percentage wrong..................           0
>total nodes searched.............. 236973211.0
>average search depth..............         4.5
>nodes per second..................      783641
>
>4 cpus  quad xeon 550:
>
>total positions searched..........         300
>number right......................         299
>number wrong......................           1
>percentage right..................          99
>percentage wrong..................           0
>total nodes searched.............. 280348143.0
>average search depth..............         4.5
>nodes per second..................      722788
>
>2 cpus, 21264/600mhz:
>
>total positions searched..........         300
>number right......................         300
>number wrong......................           0
>percentage right..................         100
>percentage wrong..................           0
>total nodes searched.............. 330905102.0
>average search depth..............         4.5
>nodes per second..................     1266767
>
>Not bad.  I had remembered 1M and 1.5M.  I just verified that those numbers
>were produced on a 667mhz machine instead, at Compaq.  A slightly faster version
>of Tim's machine.  And right in line with the 1.5M single-cpu speed of Mckinley
>at 1ghz.
>
>
>
>
>
>>
>>It was *not* a production alpha ever, the test was done long
>>before this type of alpha was put on the market, so we don't
>>know whether you can buy this alpha in the shop.
>
>I have no idea what you are talking about.  I had exactly that machine
>here in my lab, for 6+ months.  (single-cpu version).  It ran at 667 mhz
>and produced 1M nodes per second.  I didn't do much with chess on it as it
>was here to do some work for someone up the street from here.  But it was
>(and is) available for purchase.
>
>I had that machine over a year ago.  It was not a "black box" but had a name
>plate on the front and could be ordered from whomever owned the DEC stuff
>at that point in time.
>
>Someone up in the medical school bought the thing, left it here for me to
>work on some code for him, and that was that...
>
>
>
>>
>>There is another list of things wrong.
>>
>>For example if it was such a slow processor, why only getting
>>1.5 hardware speedup out of 2 processors?
>
>Because the hash table used locks.  And the locks were very bad on the
>alpha.  We later went to the "lockless hash table" that I now use.  I
>never had access to either machine (Tim's or the one in the medical
>school here) to run WAC again after that was fixed.  The out-of-order
>memory writes on the alpha require a "barrier" prior to clearing the
>lock, and the lock/unlock themselves are also very expensive.  Both
>together (lock/barrier) really produced a bottleneck.  No mystery at
>all...
>
>I think we mentioned this in the paper we wrote for ICCA which ought to
>appear in the next issue.
>
>
>
>>
>>That means a cheap dual K7 getting 2 million nodes a second is still
>>faster than this 1.5 million nodes a second dual alpha.
>
>I have not yet seen a dual K7 get 2M nodes per second with Crafty...
>
>
>>
>>Note that we compare a todays crafty version with that special
>>old thing then. Also we assume then beginners assembly for the
>>current dual K7 crafty, versus optimal defines for the alpha.
>
>The version Tim had was not that old.  The version I ran on the 667 mhz
>machine was even newer, in the 17.x group...
>
>>
>>That's not a very fair compare.
>
>Seems perfectly fair to me...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.