Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: hardware math

Author: Robert Hyatt
Date: 19:09:20 10/11/02
On October 11, 2002 at 13:04:11, Vincent Diepeveen wrote:

>On October 11, 2002 at 12:12:18, Robert Hyatt wrote:
>
>>On October 11, 2002 at 11:07:52, Vincent Diepeveen wrote:
>>
>>>On October 11, 2002 at 10:38:12, Jeremiah Penery wrote:
>>>
>>>>Hmm, let's see.  If DB gets 'upgraded to 2002 standards", that would mean they
>>>>can make a fully custom .13 micron chip running at 300MHz, able to do a full
>>>>evaluation every clock cycle.  It will also have 20GB/s memory bandwidth to
>>>>256MB of RAM for the hash tables on the board.  So one single chip will search
>>>>300M positions/second, and they can do whatever evaluation they want.  Yes, yes,
>>>>obviously a 'complete joke'.
>>>
>>>I'm more afraid for Brutus in like 30Mhz FPGA than i am for a
>>>deep blue at 0.13 micron.
>>>
>>>First of all, deep blue wasn't written in verilog or any 'high level'
>>>language. It was simply cut'n pasting the logics to each other.
>
>>Vincent, Hsu used "Project MOSIS" funded by the NSF, to produce his chess chips
>>for the
>>first few versions.
>
>>Please do a web search on MOSIS and look at their requirements as to how you
>>submit
>>something to them to fabricate.
>>
>>Then come back and say "my statement was stupid, I learned a new word "verilog"
>>and
>>used it without knowing what I was talking about."  Because that statement is
>>completely
>>true.  You _must_ use software design tools to submit something to MOSIS.  You
>>think
>>you just give them a picture and say "fab me one of these?"
>
>>Stick to an area where you have at least some small idea of what you are talking
>>about.  If
>>you want proof that they used MOSIS, read his book.  Or ask him directly.  But
>>please stop
>>spewing random noise.
>
>What i meant is that he would need a complete redesign of the old chips
>to get them into 0.13, then we didn't talk about a year or 5 needed to
>improve his evaluation function. From my draughtsprogram i know how slow
>it goes improving an evaluation function if you do not know shit from
>the game in question, this despite having someone near me (2 streets away:
>Marcel Monteba) who is very knowledgeable in draughts.
>
>>>
>>>So it would require an entire new design to make something for 0.13
>>>in verilog or whatever.
>>>
>>>Secondly, that 0.13 process technology including the big salary from Hsu
>>>would be around 20 million of investments.
>>
>>No it wouldn't.  Hsu would not need to do anything other than re-do the design
>>and submit it to an existing fab shop to produce the chips.  It wouldn't be
>>cheap,
>>but it wouldn't cost a fortune either.  The only cost would be Hsu's salary,
>>and the fab cost for a run of N chips, where N would probably need to be at
>>least
>>1000.  I don't claim to have an idea of what the cost would be, as IC
>>fabrication is
>>not something I follow closely.  But it _would_ be fast as all hell, because
>>rather than
>>20mhz they could go 100X faster with no problems at all, and probably do a
>>better
>>design since the DB chips had to make concessions for routing and gate delays
>>that
>>could be better handled today.
>
>I'm not assuming the 'university' deal here, but a commercial release
>of his cpu. Of course he would need to press like 100000 cpu's or so.
>Times say 50 dollar a cpu = 5 million dollar. To start with.

You don't have to do a quantity that large.  You fab what you need and pay the
price
that quantity dictates.  And it assumes a market that will buy the chips so that
the fab
cost becomes irrelevant...



>
>That's excluding of course paying for the machines, the expensive software
>to recalculate all his stuff for the 0.13 process and his own salary for
>5 years as well as many advisors and other idiots who want to eat from the
>project.

It wouldn't take him 5 years.  It took him less than one year to completely
re-do the
DB1 chip and get DB2 chips back and test them.  Not 5 years...


>
>You get to 10+ MLN directly and go to 20MLN dollar pretty quick.
>
>On the other side, instead of begging each year for sponsoring at many
>companies, he could also buy for a couple of thousands an FPGA board,
>rewrite his stuff to verilog or whatever language you want to and
>he just needs half an hour to synthese the code to the cpu.

And be lucky to get 1/50th the speed, which is _the_ point...



>
>It's ready to get tested then within half an hour after a source code
>change!
>
>Way less expensive than the 0.13 option :)
>
>So who is stopping Hsu from putting it into FPGA?
>
>If i understand well it's a peanut to get it to 30Mhz there and
>not so expensive to buy a few more cards. Total budget perhaps 10000 dollar.

His old processors ran at 24mhz.  30mhz?  How would that help him?


>
>Why beg for so much money to make asic cpu's?
>
>Let him proof how the thing plays first against other programs using
>a single 20-24 or even 30Mhz FPGA cpu.
>
>No need for 20 million for a 0.13 release.
>
>>>This versus a FPGA board with some tools you can get for a couple of thousands
>>>of euro's (1 euro = 1 dollar at the moment about).
>>>
>>>Further, Hsu would have to proof a number of things
>>>   being capable of implementing all kind of things like
>>>   nullmove, efficient move ordering, and a lot of evaluative
>>>   things in hardware. it's not trivial to add ram to the
>>>   chip, because a single cacheline from RAM is a lot slower than
>>>   processing a bunch of nodes in hardware. If you run at 300Mhz
>>>   with say 10 clocks a node on average, you can achieve about
>>>   30 million nodes a second.
>>
>>It is trivial to add RAM to a chip.  SRAM.  With today's densities, significant
>>SRAM is
>>common.  L1/L2/L3 cache comes to mind.
>>
>>
>>>
>>>   However you can't do 30 million random word lookups a second in
>>>   the RAM. latency is too big for that. It's not trivial to combine
>>>   the 2 things.
>>>
>>
>>
>>
>>No, but you _can_ do asynchronous lookups.  Start the probe, continue the
>>search,
>>when the probe result is ready, it can either be ignored if it is worthless, or
>>you can
>>back up to the point you did the probe and use the info.  Have you even read any
>>of
>>the literature on distributed hash tables in distributed chess engines?
>>
>>I didn't think so...
>
>We discussed this extensively a year ago or so, in case you forgot.
>My question to you is then: if hashtables slow down crafty so much,
>why aren't you doing this in crafty?

I answered this simply.  If I hash in the q-search, it slows the search 10%.  It
makes the
tree 10% smaller.  A complete wash.  I choose to _not_ hash in the q-search
because it
reduces the load on the hash table, letting me get by with a smaller table
without losing
efficiency.

If you want to compare the speed of a q-search hasher with one that doesn't let
me know, I
can give you the code to test...




>
>Right, it's combining the worst of both worlds :)
>
>>
>>
>>
>>
>>
>>>   In fact crafty with 1 million nodes a second can't even do all requests
>>>   to a hashtable.
>>
>>What on earth does that mean?  It _does_ do them.  I've even been testing
>>hashing in the
>>q-search again, just to see if it is worthwhile after a few years of not doing
>>it.  It slowed me
>>down about 10%.  It made the tree about 10% smaller.  On my quad xeon I went
>>from 1.6M
>>nodes per second to 1.4 roughly.
>>
>>So I have no idea what the above statement you made means...
>
>>Cray Blitz probed _everywhere_ and it had no problem running at 5-7M nodes per
>>second
>>on a T932...
>
>Don't compare supercomputers with a program on a single cpu and a single
>memory controller.
>
>>
>>
>>
>>>
>>>An important point in the end is the price where this all gets produced for,
>>>because you need to sell a bunch of these processors, or you won't get
>>>back that $20 million of investments.
>>>
>>>And in the end, when the cpu hits the market after say a year or 5,
>>>then i'll be having a 4 processor 10Ghz intel/amd machine delivering
>>>millions of nodes a second for DIEP :)
>>>
>>>>>Of course it gets completely annihilated when appearing in 1997 standards.
>>>>>
>>>>>So if Hsu upgrades his chip to a single cpu chip with a new and better
>>>>>evaluation (it's of course questionable whether he is capable of
>>>>>manaqing that) then it will not search deeper than deep blue in 1997
>>>>>of course, unless he adds nullmove and hashtables.
>>>>
>>>>The above paragraph has no basis in reality.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.