Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Crafty modified to Deep Blue - Crafty needs testers to produce outputs

Author: Robert Hyatt
Date: 10:50:10 06/19/01
On June 19, 2001 at 11:41:39, Vincent Diepeveen wrote:

>On June 19, 2001 at 00:46:25, Robert Hyatt wrote:
>
>>On June 18, 2001 at 19:06:05, Vincent Diepeveen wrote:
>>
>>>
>>>This is utterly nonsense
>>>
>>>   a) their hashtable was already filled
>>>   b) first 30 seconds or so they search from parallel viewpoint
>>>      very inefficient. First 10 seconds or so they can only
>>>      get a few processors to work efficiently together, only
>>>     after perhaps a minute all processors slowly are working a
>>>     bit efficient.
>>>     I already have this problem for 4 processors, Zugzwang clearly
>>>     had this problem. etcetera.
>>
>>
>>_you_ might have this problem. I don't.  I will be quite happy to
>>post some speedup numbers for 1-16 processors with Cray Blitz, where
>>the search time limit is 1 second max.  And that speedup is no different
>>than the speedup produced for 60 second searches.  Ditto for Crafty, which
>>you can prove easily enough.
>
>  a) cray blitz is having shared memory
>  b) cray blitz is using nullmove (even though R is smaller as a few)
>  c) you don't have cray blitz source anymore, when i asked after it
>     some time ago you said you lost it.


I said I don't have it on my machine here any longer.  I could certainly
ask the folks up at Cray to dig back thru the tape archives to resurrect it
if needed.  I don't do that just for fun however, as it is a hassle for
them.


>  d) crafty blitz isn't using 16 hardware processors at each general cpu.
>     both you and i know that a single cpu can't split itself and give
>     16 hardware cpu's over a dead slow bus (with huge latency that is)
>     16 commands at the same time.


NO cpu can give 16 commands at the same time.  A cpu can do exactly one
operation at a time when dealing with I/O.

The VME bus is not "dead slow with huge latency".  Just visit somewhere and
look up the standard.  It is _far_ faster than the stuff you do with messages
and separate processes, for example.

I watched deep thought play speed chess all the time.  Your opinion about it
being horrible in first 30 seconds is dead wrong.  It smashed GM after GM at
blitz... and it was searching just as deeply as we would expect it to given its
search depth for 3 minute searches...



>
>What a single processor had to do with DB was to get a position and
>then put somehow 16 processors to work.

That is trivial.  Generate all legal moves, make each of the first 16,
one at a time, then tell each processor "search that position to depth=6
and report back with the best move and score".  When one finishes, give it
the next one.  Have you read Hsu's thesis on this?  That was _not_ a problem.



>
>So there is a double splitting problem. First the normal split problems
>which at 30 processors is quite solvable as we both know, and you're
>offering to show now with something which you said you didn't have anymore
>that it's solvable!

You do believe that I _wrote_ Cray Blitz and that I _know_ exactly how it
worked?  I wish you would stop trying so hard to define what can _not_ be
done, just because you haven't yet done it, and spend more effort on trying
to do those things that look "hard" to do instead.  You continually say "you
can't get a reasonable speedup on a cluster" and then you say "Now I have
studied this a while and I can do this."  DB's hardware was so far ahead of
anything we have to play on, the hardware was _not_ a bottleneck.  I would
love to have something that I can spend a microsecond to give it a position,
and then have it give me something back in a few milliseconds that represents
a full 6-ply search on that position.  I would _love_ to have that problem...




>
>Secondly there is the problem that a single processor needs to efficiently
>split the tree it searches for 16 hardware processors.
>
>That's pretty hard to do :)

That isn't so hard to do.  I did it on the C90 and T90 just fine.  You can
find the performance results in an old ICCA journal.  Crafty is doing just about
as well as Cray Blitz did.



>
>20% total speedup out of 480 processors at a 3 minute search is
>hard to improve then!

So.  20% of 1B nodes per second peak is _still_ a very big number...

Or, if we take Hsu's statement "I was able to drive the chess processors at
about a 70% duty cycle, averaged over the game" then that gives 700M nodes
per second times 20%.    I think DB's "real efficiency" lies somewhere between
these two extremes.  They didn't "count nodes" at all, any more than Belle did.
I believe that his 200M came from "My search is about 20% efficient" that he
said many times.  20% times one billion == 200M.




>
>It's a *very* good speedup.
>
>If you run software on 480 processors i doubt you get to 20% speedup.


I'll take that bet.  And one day will be able to post the numbers.  20%
doesn't seem unreachable to me at all... it seems way lower than I would
want in fact...

>
>OF course that's with cray blitz and other programs. As soon as you
>search with a very stupid branching factor getting 20% speedup is
>possible to get.
>
>So *somehow* there it lost from its 200M nps 80% speed. It's not
>hard to imagine where. Near the leaves. Each leaf being a 6 ply
>search without hashtable, so the only position where it can split
>is the leaf position. If that position is giving a beta cutoff,
>too bad for 15 other processors! You get an actual speedup of 1/16
>then!

Nope.. .you just stop them instantly as I do in Crafty.  That is not a
problem to do.  And again, I suspect the 200M nodes per second already
accounts for that 20% number...



>
>The problems are easy to imagine.
>
>Best regards,
>Vincent
>
>>
>>
>>You say "first few ply out of hash table" but then you say "they can't hash in
>>last 6-7 plies which is killing performance."  Which is it?  It can _not_ be
>>both.
>
>first 6 ply are in hashtable of a search. so if i make a move
>then still 5 are left.

Nope, 5 are left.  I searched to depth=10, I played a move, that removes one
ply, you play a move, that removes another.  There are 8 plies of stuff left.



>
>>But we can solve this easier.  Lets take the first move out of book for all 6
>>games.  And compute the branching factor for those.  No hash there.
>
>openings position is the most dangerous position to draw conclusions from.
>
>If you would have studied the log files you would have known that already
>at game start they start to search. So at move 1 they already search.
>
>They don't start to search first move out of book.
>
>So that's not relevant here.
>
>First move out of book is not relevant at all. They didn't clear hash
>of course. You don't already search at move 1 if it isn't giving a speedup.
>
>I am considering using the same aproach with diep, because when being
>parallel it's important to get some things out of hashtable as then you
>can put the both processors quicker to work, where now the parallel speedup
>first 20 seconds of diep is horrifying. To get from 1.2 to 1.8 takes
>half a minute!!!!!


I don't have that problem, myself.  Here are a couple of tests:

1 processor, 8.5 seconds, 4 processors, 2.3 seconds.

1 processor, 4.03 seconds, 4 processors, 1.22 seconds.

So _everybody_ doesn't have the problems you have.  And if _everybody_ doesn't
have those problems, then there is nothing that says Deep Blue had to have them
either...

I will be happy to give you the raw data for the above, or let you run the
tests yourself, or tell you the positions I used so you can run them on another
machine if you want..

>
>Lose that much in overhead that to get near to 2.0 takes
>up to 5 minutes sometimes.


That is a problem in your implementation, which doesn't mean it is a problem
_everyone_ has to have.



>
>Now if i would have 480 processors, then it's even more important to
>quickly get processors to work. So having a few ply out of hashtable then
>which directly then can put a bunch of processors to work, instead of
>only 16 (Young brother wait, probably heard of it?), then i would
>go for that too.

Again, for me that would not be a problem.  The first thing Hsu did when he
started his parallel search was to ask for a copy of my dissertation, which I
sent him.  I doubt he did anything worse than what I did.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.