Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hydra node speed from CSS forum

Author: Robert Hyatt

Date: 08:29:38 09/01/04

Go up one level in this thread


On September 01, 2004 at 00:36:27, Tony Werten wrote:

>On August 31, 2004 at 10:52:58, Robert Hyatt wrote:
>
>>On August 31, 2004 at 02:39:29, Tony Werten wrote:
>>
>>>On August 30, 2004 at 17:43:23, Robert Hyatt wrote:
>>>
>>>>On August 30, 2004 at 17:14:32, Gerd Isenberg wrote:
>>>>
>>>>>On August 30, 2004 at 16:57:50, Robert Hyatt wrote:
>>>>>
>>>>>>On August 30, 2004 at 16:39:22, Mark Young wrote:
>>>>>>
>>>>>>>On August 30, 2004 at 15:33:19, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On August 30, 2004 at 14:51:01, Uri Blass wrote:
>>>>>>>>
>>>>>>>>>On August 30, 2004 at 13:51:48, Robert Hyatt wrote:
>>>>>>>>>
>>>>>>>>>>On August 30, 2004 at 12:24:54, Volker Böhm wrote:
>>>>>>>>>>
>>>>>>>>>>>On August 30, 2004 at 10:02:54, Robert Hyatt wrote:
>>>>>>>>>>>
>>>>>>>>>>>>On August 30, 2004 at 08:30:34, Kurt Utzinger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>On August 30, 2004 at 08:12:52, Jouni Uski wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>Eine FPGA-Karte untersucht momentan ca. 3 Millionen Positionen/Sekunde. 16
>>>>>>>>>>>>>>Karten machen daher theoretisch 48 MPos/sec. (Donninger)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Jouni
>>>>>>>>>>>>>
>>>>>>>>>>>>>      If Hydra made 48 Mpos/sec this again proves (in comparison
>>>>>>>>>>>>>      with the 2 Mpos/sec on Quad-Opteron server with 4 CPU's of
>>>>>>>>>>>>>      Shredder) that the number of pos/sec can't be taken as a
>>>>>>>>>>>>>      reliable value for the goodness of a chess program. It's
>>>>>>>>>>>>>      of course simply impossible to compare apples and organes.
>>>>>>>>>>>>>      Kurt [http://www.utzingerk.com]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>Don't forget that Hydra ripped Shredder's head off.  So the NPS _might_ be
>>>>>>>>>>>>significant here...
>>>>>>>>>>>
>>>>>>>>>>>Didn´t I´ve heard you saying that 10 games are not enough to draw a
>>>>>>>>>>>statistically significant conclusion on the playing strength?
>>>>>>>>>>>
>>>>>>>>>>>Greetings Volker
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>With two _close_ opponents, correct.  But if one is seriously stronger, as hydra
>>>>>>>>>>appeared to be, 10 games is plenty.
>>>>>>>>>
>>>>>>>>>We do not know if hydra is seriously stronger.
>>>>>>>>
>>>>>>>>We have a pretty good clue that it is.  It is over 10x faster, potentially, than
>>>>>>>>other programs.
>>>>>>>>
>>>>>>>>1. I first assume that the programmer / designer is no dummy.
>>>>>>>>
>>>>>>>>2.  all else being "equal" 10x faster is a _serious_ advantage.
>>>>>>>>
>>>>>>>>3.  the above two points translate into a signficant strength advantage.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>You cannot start by assuming that hydra is significantly stronger when this is
>>>>>>>>>the question.
>>>>>>>>
>>>>>>>>With evidence, you can.  IE I can certainly assume that Crafty on an 8-way
>>>>>>>>opteron is significantly stronger than Crafty on my dual xeon.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>If you see 10-0 you can say based on the result that Hydra is significantly
>>>>>>>>>stronger but when you see 5.5-2.5 you cannot claim it based on the result and
>>>>>>>>>you only can say that you do not know if it is significantly stronger based on
>>>>>>>>>the result.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>If you only look at the results, maybe or maybe not.  But I watched many of the
>>>>>>>>games with Crafty analyzing.  That tells you even more.
>>>>>>>
>>>>>>>Common sense should tell us that Hydra is stronger. It should have a big
>>>>>>>hardware advantage. I think this is your point, and I agree. But I still need
>>>>>>>more data to be sure. Right now there is only a 1 in 6 chance that Hydra is the
>>>>>>>stronger program based on the games alone.
>>>>>>
>>>>>>Where does "one in six" come from?
>>>>>>
>>>>>>IE
>>>>>>
>>>>>>1.  Hydra is at least 10x faster
>>>>>>
>>>>>>2.  It won three and drew five if I recall.  which is 5.5/8.0.  which is right
>>>>>>at +200 on the Elo scale.
>>>>>>
>>>>>>Yes the number of games is low, but the 1 in 6 seems very weak.  IE when you
>>>>>>consider _everything_.  And don't forget that older versions (Brutus in
>>>>>>particular) played on ICC, and the current version is playing on playchess.
>>>>>>There are a lot of games.  It is nowhere near invincible.  But it is _very_
>>>>>>strong compared to other programs.
>>>>>>
>>>>>>Of course I am not _sure_.  But I an fairly well convinced. :)
>>>>>
>>>>>Yes, but Hydra had the advantage to prepare and tune againts a public available
>>>>>Shredder, while Shredder had not the chance.
>>>>>
>>>>>Anyway i am curious about Hydra's further speedups, and "fear" they will
>>>>>dominate the scene for some time.
>>>>>
>>>>>Current FPGAs they use, are still not the fastest.
>>>>>FPGAs may have more future speedgrowing as general purpose hardware for some
>>>>>time. More and "wider" chess "alus" for evaluation purposes.
>>>>>PCI express will speedup the hard- software communication.
>>>>>Even if it is not efficient, an additional ply in hardware is reasonable with
>>>>>respect to the possible PCI bottleneck.
>>>>>
>>>>>Do you have an idea about the parallel speedup of this 8*2 cluster, about six?
>>>>
>>>>
>>>>I really can't make an educated guess.
>>>>
>>>>A cluster will be less efficient than a pure SMP box.  But to take this in
>>>>"pieces" we might get a number that is reasonable.
>>>>
>>>>IE 8 nodes.  I'd think that if it was 4x faster with 8 nodes that would be a
>>>>win.  Since the nodes have to talk via message-passing, sharing the hash table
>>>>will be a problem.  4x faster would be worth doing.
>>>>
>>>>Inside a node, there are two FPGA boards as I understand this.  I don't know how
>>>>much of the search is done in one of these FPGA boards, but whatever it is,
>>>>there is an efficiency loss with no hashing going on.  Of course I don't hash in
>>>>the q-search, and I believe Amir has said he doesn't hash in the last normal ply
>>>>of his search either, so perhaps this is not a killer.  The main problem is that
>>>>this hardware apparently has to do a fixed-depth search, which limits split
>>>>points and accumulates significant amounts of "idle time" (ie if it is necessary
>>>>to only do hardware searches at ply=N, sometimes ply=N is an ALL node and the
>>>>parallel search will work well, othertimes it is a CUT node and parallel search
>>>>won't work at all.  What they lose to deal with this I don't know.  But I'd
>>>>guess that 1.5x would be a number I would be happy with.
>>>
>>>Chrilly told me they use a slightly variable depth to solve this. ie Add 1 ply
>>>to the hardware search on odd plies (or even ?) en stuff the software search 1
>>>ply earlier to the cards.
>>>
>>>Tony
>>>
>>
>>That sounds like a solution with even more problems.  IE one ply deeper in
>>hardware is one ply of "simpler" search, further exacerbated by no hash tables
>>in the hardware.  IE the best approach for a limited hardware search is to tack
>>it on the end of the search using as few plies as possible since those plies are
>>inferior to the software search plies...
>
>Yes, maybe they lower the hardware depth and send it to the cards 1 ply later ?
>So 1 ply more in software. Anyway, I was told the hardware search depth was 2 or
>3 ply, depending on the nodetype.
>

That sounds like yet another problem.  IE if you reduce the hardware depth by 1
ply, you multiply the number of hardware searches required by some large
constant since the searches go faster.  That was a real DB problem, because
there is a "perfect balance point" between the software and hardware search,
where the software wants to search deep enough that it can produce enough
positions quickly enough to keep the hardware search busy, while not searching
so deeply in the hardware that the hardware lags behind the software search.

There really is one "right point" when you have a hardware/software bottleneck
like the PCI bus.

>>
>>DB had a solution to this, as it supported hash table probes in the hardware,
>>but Hsu never had time to design/build the multi-ported memory it required.  But
>>he knew it was a long-term issue that needed a solution obviously...
>>
>>One other note.  Even "adding a ply" won't solve the CUT/ALL problem.  For
>>example, take Crafty and pick some remaining depth point, such as 3 plies, and
>>for an even or odd iteration, count the number of times that depth is a CUT or
>>ALL node.  It will vary wildly, independently of the iteration depth, because of
>>search extensions, move ordering, etc...
>
>Hmm, I'm not sure. My own experience is that extensions tend to stabalize this
>variation. ie The odd/even effect (wich is another result of the same problem I
>think ) gets less when you extend more.


I'm not talking about odd/even score changes and depth variations.  I'm talking
about the fact that if you pick positions where remaining depth = some constant
number (DB used 4-5-6-7 based on their logs) to select the point where you hand
the search off to the hardware, sometimes those plies are "ALL" (good) and
sometimes they will be "CUT" (bad).  If you wait a ply to hand things off to the
hardware search, you have to reduce the hardware search depth by 1, which can
lead to a software/hardware load imbalance.  Hand it off one ply early and the
same thing happens...





>
>Tony
>
>>
>>
>>
>>
>>
>>
>>>>
>>>>That makes your 6X faster (rather than 16X the theoretical max) a fairly decent
>>>>guess, although it _is_ still a guess.  But 6x times 3M nodes per second is
>>>>still faster than any current programs on 4-way opterons.  I've passed that
>>>>speed on 8-way boxes (and beyond) but that hardware is pretty rare to come by.
>>>>They are no doubt very strong.
>>>>
>>>>
>>>>
>>>>
>>>>>I have no idea how well this message passing clusters scale with huge number of
>>>>>nodes >= 32 or 256,512 or 1024. Vincent was pessimistic about that, iirc. But it
>>>>>seems that internode bandwidth and latency has also some potential to become
>>>>>faster and faster to make bigger clusters more efficient.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.