Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hydra node speed from CSS forum

Author: Tony Werten

Date: 21:36:27 08/31/04

Go up one level in this thread


On August 31, 2004 at 10:52:58, Robert Hyatt wrote:

>On August 31, 2004 at 02:39:29, Tony Werten wrote:
>
>>On August 30, 2004 at 17:43:23, Robert Hyatt wrote:
>>
>>>On August 30, 2004 at 17:14:32, Gerd Isenberg wrote:
>>>
>>>>On August 30, 2004 at 16:57:50, Robert Hyatt wrote:
>>>>
>>>>>On August 30, 2004 at 16:39:22, Mark Young wrote:
>>>>>
>>>>>>On August 30, 2004 at 15:33:19, Robert Hyatt wrote:
>>>>>>
>>>>>>>On August 30, 2004 at 14:51:01, Uri Blass wrote:
>>>>>>>
>>>>>>>>On August 30, 2004 at 13:51:48, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>On August 30, 2004 at 12:24:54, Volker Böhm wrote:
>>>>>>>>>
>>>>>>>>>>On August 30, 2004 at 10:02:54, Robert Hyatt wrote:
>>>>>>>>>>
>>>>>>>>>>>On August 30, 2004 at 08:30:34, Kurt Utzinger wrote:
>>>>>>>>>>>
>>>>>>>>>>>>On August 30, 2004 at 08:12:52, Jouni Uski wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>Eine FPGA-Karte untersucht momentan ca. 3 Millionen Positionen/Sekunde. 16
>>>>>>>>>>>>>Karten machen daher theoretisch 48 MPos/sec. (Donninger)
>>>>>>>>>>>>>
>>>>>>>>>>>>>Jouni
>>>>>>>>>>>>
>>>>>>>>>>>>      If Hydra made 48 Mpos/sec this again proves (in comparison
>>>>>>>>>>>>      with the 2 Mpos/sec on Quad-Opteron server with 4 CPU's of
>>>>>>>>>>>>      Shredder) that the number of pos/sec can't be taken as a
>>>>>>>>>>>>      reliable value for the goodness of a chess program. It's
>>>>>>>>>>>>      of course simply impossible to compare apples and organes.
>>>>>>>>>>>>      Kurt [http://www.utzingerk.com]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Don't forget that Hydra ripped Shredder's head off.  So the NPS _might_ be
>>>>>>>>>>>significant here...
>>>>>>>>>>
>>>>>>>>>>Didn´t I´ve heard you saying that 10 games are not enough to draw a
>>>>>>>>>>statistically significant conclusion on the playing strength?
>>>>>>>>>>
>>>>>>>>>>Greetings Volker
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>With two _close_ opponents, correct.  But if one is seriously stronger, as hydra
>>>>>>>>>appeared to be, 10 games is plenty.
>>>>>>>>
>>>>>>>>We do not know if hydra is seriously stronger.
>>>>>>>
>>>>>>>We have a pretty good clue that it is.  It is over 10x faster, potentially, than
>>>>>>>other programs.
>>>>>>>
>>>>>>>1. I first assume that the programmer / designer is no dummy.
>>>>>>>
>>>>>>>2.  all else being "equal" 10x faster is a _serious_ advantage.
>>>>>>>
>>>>>>>3.  the above two points translate into a signficant strength advantage.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>You cannot start by assuming that hydra is significantly stronger when this is
>>>>>>>>the question.
>>>>>>>
>>>>>>>With evidence, you can.  IE I can certainly assume that Crafty on an 8-way
>>>>>>>opteron is significantly stronger than Crafty on my dual xeon.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>If you see 10-0 you can say based on the result that Hydra is significantly
>>>>>>>>stronger but when you see 5.5-2.5 you cannot claim it based on the result and
>>>>>>>>you only can say that you do not know if it is significantly stronger based on
>>>>>>>>the result.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>If you only look at the results, maybe or maybe not.  But I watched many of the
>>>>>>>games with Crafty analyzing.  That tells you even more.
>>>>>>
>>>>>>Common sense should tell us that Hydra is stronger. It should have a big
>>>>>>hardware advantage. I think this is your point, and I agree. But I still need
>>>>>>more data to be sure. Right now there is only a 1 in 6 chance that Hydra is the
>>>>>>stronger program based on the games alone.
>>>>>
>>>>>Where does "one in six" come from?
>>>>>
>>>>>IE
>>>>>
>>>>>1.  Hydra is at least 10x faster
>>>>>
>>>>>2.  It won three and drew five if I recall.  which is 5.5/8.0.  which is right
>>>>>at +200 on the Elo scale.
>>>>>
>>>>>Yes the number of games is low, but the 1 in 6 seems very weak.  IE when you
>>>>>consider _everything_.  And don't forget that older versions (Brutus in
>>>>>particular) played on ICC, and the current version is playing on playchess.
>>>>>There are a lot of games.  It is nowhere near invincible.  But it is _very_
>>>>>strong compared to other programs.
>>>>>
>>>>>Of course I am not _sure_.  But I an fairly well convinced. :)
>>>>
>>>>Yes, but Hydra had the advantage to prepare and tune againts a public available
>>>>Shredder, while Shredder had not the chance.
>>>>
>>>>Anyway i am curious about Hydra's further speedups, and "fear" they will
>>>>dominate the scene for some time.
>>>>
>>>>Current FPGAs they use, are still not the fastest.
>>>>FPGAs may have more future speedgrowing as general purpose hardware for some
>>>>time. More and "wider" chess "alus" for evaluation purposes.
>>>>PCI express will speedup the hard- software communication.
>>>>Even if it is not efficient, an additional ply in hardware is reasonable with
>>>>respect to the possible PCI bottleneck.
>>>>
>>>>Do you have an idea about the parallel speedup of this 8*2 cluster, about six?
>>>
>>>
>>>I really can't make an educated guess.
>>>
>>>A cluster will be less efficient than a pure SMP box.  But to take this in
>>>"pieces" we might get a number that is reasonable.
>>>
>>>IE 8 nodes.  I'd think that if it was 4x faster with 8 nodes that would be a
>>>win.  Since the nodes have to talk via message-passing, sharing the hash table
>>>will be a problem.  4x faster would be worth doing.
>>>
>>>Inside a node, there are two FPGA boards as I understand this.  I don't know how
>>>much of the search is done in one of these FPGA boards, but whatever it is,
>>>there is an efficiency loss with no hashing going on.  Of course I don't hash in
>>>the q-search, and I believe Amir has said he doesn't hash in the last normal ply
>>>of his search either, so perhaps this is not a killer.  The main problem is that
>>>this hardware apparently has to do a fixed-depth search, which limits split
>>>points and accumulates significant amounts of "idle time" (ie if it is necessary
>>>to only do hardware searches at ply=N, sometimes ply=N is an ALL node and the
>>>parallel search will work well, othertimes it is a CUT node and parallel search
>>>won't work at all.  What they lose to deal with this I don't know.  But I'd
>>>guess that 1.5x would be a number I would be happy with.
>>
>>Chrilly told me they use a slightly variable depth to solve this. ie Add 1 ply
>>to the hardware search on odd plies (or even ?) en stuff the software search 1
>>ply earlier to the cards.
>>
>>Tony
>>
>
>That sounds like a solution with even more problems.  IE one ply deeper in
>hardware is one ply of "simpler" search, further exacerbated by no hash tables
>in the hardware.  IE the best approach for a limited hardware search is to tack
>it on the end of the search using as few plies as possible since those plies are
>inferior to the software search plies...

Yes, maybe they lower the hardware depth and send it to the cards 1 ply later ?
So 1 ply more in software. Anyway, I was told the hardware search depth was 2 or
3 ply, depending on the nodetype.

>
>DB had a solution to this, as it supported hash table probes in the hardware,
>but Hsu never had time to design/build the multi-ported memory it required.  But
>he knew it was a long-term issue that needed a solution obviously...
>
>One other note.  Even "adding a ply" won't solve the CUT/ALL problem.  For
>example, take Crafty and pick some remaining depth point, such as 3 plies, and
>for an even or odd iteration, count the number of times that depth is a CUT or
>ALL node.  It will vary wildly, independently of the iteration depth, because of
>search extensions, move ordering, etc...

Hmm, I'm not sure. My own experience is that extensions tend to stabalize this
variation. ie The odd/even effect (wich is another result of the same problem I
think ) gets less when you extend more.

Tony

>
>
>
>
>
>
>>>
>>>That makes your 6X faster (rather than 16X the theoretical max) a fairly decent
>>>guess, although it _is_ still a guess.  But 6x times 3M nodes per second is
>>>still faster than any current programs on 4-way opterons.  I've passed that
>>>speed on 8-way boxes (and beyond) but that hardware is pretty rare to come by.
>>>They are no doubt very strong.
>>>
>>>
>>>
>>>
>>>>I have no idea how well this message passing clusters scale with huge number of
>>>>nodes >= 32 or 256,512 or 1024. Vincent was pessimistic about that, iirc. But it
>>>>seems that internode bandwidth and latency has also some potential to become
>>>>faster and faster to make bigger clusters more efficient.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.