Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: DB NPS (anyone know the position used)?

Author: Robert Hyatt
Date: 10:26:55 01/27/00
On January 27, 2000 at 00:34:01, Eugene Nalimov wrote:

>On January 26, 2000 at 23:53:49, Robert Hyatt wrote:
>
>>On January 26, 2000 at 20:03:56, Peter W. Gillgasch wrote:
>>
>>>
>>>This is what I think too. Even if they have still significant clock
>>>cycle difference between "cheap" and "expensive" nodes they can overlap
>>>the first FIND-AGGRESSOR / FIND-VICTIM operation with the slow eval
>>>cycle. Anyone cares to post the actual cycle counts given in IEEE micro ?
>>>
>>>Anyway, I still believe that the argument regarding the fail low q nodes
>>>is bogus because it assumes that the rate of fail low q nodes after a 10
>>>ply full width search plus extensions and a 4 ply hardware search (is this
>>>a q search only?)
>>
>>No.  This is a full-width search with extensions, + q-search.  It was
>>basically the same idea as done in the belle machine, except collapsed to
>>one chip.
>>
>>
>>
>>> multiplied with the clock cycle difference between the
>>>slow and the fast eval can differ so greatly that it is not washed out
>>>by the overhead move generation and tree traversal cycles which are
>>>probably in the order of 1,1,2,2,2 = 8 cycles for each "cheap" cutoff.
>>
>>also remember that this is synchronous logic.  The fast eval can give a quick
>>exit, but it still takes 10 cycles to exit as I understand it.  As he was very
>>specific to say 24mhz processors searched 2.4 M nodes per second exactly.  And
>>his 20mhz procssors searched 2M nodes per second exactly.  That tends to say
>>that the fast eval/slow eval/other stuff are done in parallel and used as
>>needed...  It would be harder to design a piece of hardware that had a variable
>>number of cycles per node without microprogramming the thing...
>
>I don't see why have a fast eval if it is evaluated in the same amount of
>clocks. Why not use the "standard" one?
>
>Eugene
>

Because at the point where the fast eval is finished, you can make the
decision to quit.  You might not be able to quit "early", but if you take
the full eval, you have to make the decision _after_ it is finished, which
means cycles beyond when it becomes available.

Just a guess.

IE think about how the cache and even the TLB lookups are done.  In parallel
you probe the cache and start the memory read.  If the cache responds, you abort
the memory operation quickly enough to not disturb the memory controller at all,
but if it fails, you didn't add to the delay as the memory read is already in
the pipe...

I always consider point A as the beginning (CPU in the cache question) and
point B as the place we are trying to get to (memory in the cache question).
If you add _anything_ between point A and point B, and it is added sequentially
between two stages, A and B get further apart.  If you add something in parallel
with something already done between A and B there is no apparent cost.

That is how I assume much of the DB logic works, as that was how Belle worked
in essence...



>>>
>>>The whole proposition by Ernst implies implicitly that their on chip
>>>move ordering is all messed up and their slow eval is painfully slow
>>>compared with the move generation and tree traversal clock cycles.
>>>
>>>I don´t buy that.
>>>
>>>-- Peter
>>>
>>>>This would change if some of this stuff backs up into the software part of
>>>>the search, of course...  But we seem to be talking only about the q-search
>>>>as implemented in hardware, and every node saved is N nanoseconds saved, period.
>>>
>>>Bob I really hate it when we share the same opinion 8^)
>>>
>>>>(N is roughly 500, exactly 500 for the 20mhz processors, a little less for the
>>>>24mhz processors).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.