Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Status of Brutus?

Author: Vincent Diepeveen

Date: 10:36:09 07/28/03

Go up one level in this thread


On July 27, 2003 at 14:42:46, Keith Evans wrote:

>On July 27, 2003 at 07:37:31, Vincent Diepeveen wrote:
>
>>On July 27, 2003 at 06:31:58, Jonas Bylund wrote:
>>
>>>On July 26, 2003 at 17:22:02, Russell Reagan wrote:
>>>
>>>>On July 26, 2003 at 16:25:37, O. Veli wrote:
>>>>
>>>>>Since it is hardware, can
>>>>>we expect to be stronger than top software?
>>>>
>>>>I would expect it to be slower than top software, because cpu improvements
>>>>happen so quickly, and FPGA programming (from what I've heard) is not a simple
>>>>task. If he spends another two years working on it before releasing it (as
>>>>Slater said), just imagine how much faster the cpus will be by then.
>>>>
>>>>If you're talking about something massively parallel like Deep Blue, that is one
>>>>thing, but a single PCI card? I doubt that is going to do any better than break
>>>>even with top of the line hardware, so why bother? IBM threw so much hardware at
>>>>the problem that desktop cpu improvements wouldn't catch up for a LONG time, but
>>>>a single PCI card doesn't seem to be worth the trouble of programming the thing,
>>>>because desktop/server cpus will probably outperform it before too long.
>>>
>>>The way i understand it, the whole idea with running FPGA is that no matter how
>>>much knowledge you add, you won't lose speed, will that not more than compensate
>>>for the PC programs gain through faster hardware?
>>
>>Quote from Chrilly Donninger Paderborn, februari a few years ago (98 or my
>>memory says 99 now):
>>  "I do not believe in knowledge at all Vincent. You are taking the wrong path.
>>Nimzo in fact only grew stronger when i REMOVED knowledge from it".
>>
>>Someone who always follows simple solutions i could not possibly believe he
>>manages to put a lot of knowledge in hardware. Where 'a lot' is measured by 2300
>>FM standards.
>
>I think that the point of doing an FPGA engine is that you're planning on adding
>more knowledge than the software only solutions have, or you're trying to run at
>a higher NPS with equivalent knowledge. If you took almost all knowledge out

Let's be clear here disadvantages:
  a) you have a Mhz disadvantage for sure always in fpga towards software.
  b) a very bad move ordering which exponentially makes your
     search need more nodes always.
  c) it is hell of a lot of work to get it to work at fpga
     and it is incompatible
  d) software you can sell to a lot of people for a fair price
  e) you must get world champion and play kasparov otherwise you will not
     sell many cards.
  f) lacking hashtables (too expensive to put on a card for sales) means your
     efficiency is very low.
  g) the limiting factor for a search is the number of times the pci bus
     can handle a search request. the reason for that is that you can put
     more in the software if that goes faster.
  h) testing of a fpga card is harder unless you
     plan to hardly modify the program each time.

the advantage are:
  a) you can parallellize knowledge
  b) if you sell such a card no one can copy it illegal
  c) earnings are bigger a version and if you sell a couple of thousands
     then chessbase earns already millions at it.

So in the long run fpga always loses from software for sure from money
viewpoint. If brutus doesn't win 2003 world champs he's history then.

>except for material count/piece square tables, then a software only solution on
>today's top CPUs will probably run at a similar speed to an FPGA implementation.
>My guess would be that today's FPGAs would run at somewhere between 2-5 million
>nodes/s with a Belle style move generator depending upon how much effort one
>spends doing a good place and route to maximize the operational frequency.
>(Maybe there's a better way to implement a move generator, but I don't know it.
>Also maybe I'm a bit pessimistic about the operating frequency.)
>
>Given how large the Xilinx Virtex2 FPGAs have gotten, it's a good time to start
>doing some experiments if one is interested in this sort of thing. (Especially
>if the price is irrelevent.)

price is relevant. chessbase isn't investing in fpga to not get back their
money.

if price is irrelevant you of course buy your own supercomputer and use that
thing. it saves years of time. improving diep from parallel program to
supercomputer program at a supercomputer is 1 year fulltime work. way less than
the many years donninger needs for brutus. advantage is that my thing then keeps
running at numa systems unmodified and brutus each new modification you must
change in the verilog code and then recompile and test etc. it goes very slow.

>With the right FPGA platform you could implement a decent move ordering, and

no you cannot develop good move ordering. in fact you can only use a kind of
near random move ordering in fpga. the reason is that the cpu is clocked very
low.

Donninger assumes 7 clocks a node. about that.

If you add killermoves then it is 8 clocks a node, right?
If you add SEE then it is 9 clocks a node, right?
If you add psq move ordering like Ed describes it is 10 clocks a node.

So each thing you do will add a clock to it.

So your program slows down 60% in speed directly from a bit improvements in move
ordering.

Same thing for extensions etc.

Every feature you add in search is a clock extra because search is sequential!

Now we didn't discuss hashtables yet.

Suppose we add hashtable move, which would improve move ordering most.

First of all move ordering takes 1 clock extra. So a slowdown to the current
situation of 15% directly.

Then we have the RAM latency. Of course SRAM is too expensive and we would need
too much of it to be affordable. Let's assume DDR ram. About 2 GB.

You need a chipset then or something. The difficult stuff they do at the opteron
is only something that the best hardware designers get to work.

I am not sure how this works in fpga. Probably it is very hard to accomplish if
your fpga is clocked at other speed than the RAM.

Let's assume 280 ns. At 40 million nps at a 133Mhz card of the future that means
that 1 node is 25ns. So you get real real slow.

Where does hashtables give you the biggest savings? Right near the leafs!!
Of course the cutoffs near the root make the branching factor better there, but
the biggest savings for diep in near the leafs.

If you consider that the searches in hardware are only 2 ply or 3 ply, then it
is trivial to see what is the problem here.

Trivially only if you use the hashtable everywhere it would benefit you a lot.

But the chip gets too many nps to use hashtables. Period.

More important is price. it is too expensive to mass produce!

>have quick access to hash tables without worrying about things like TLBs. With
>an FPGA only platform (no memories connected to the FPGA) it's going to be hard
>to get a decent branching factor.

Shredder searches nowadays like 19 ply. Of course that's with massive forward
pruning but it's 19 ply! every mainline is like 22 ply or something. Really
massive deep.

Of course there is overhead, but let's just express it in %.

b.f. observed by Fritz: 2.8 (shredder even less than that).

b.f. without hashtables: 3.5

We are talking about a full search in hardware now of course, because it is
already trivial nowadays that in the future it is impossible to have a major
advantage towards pc's with a hardware/software combination.

Within a few years the fpga needs to be faster:
 3.5^19 / 2.8 ^ 19 = 69 times

40 million / 69 = 0.6MLN

So break even point is like 0.6 mln nodes a second.

Shredder is slightly above that nps, of course without forward pruning shredder
would be massively above that nps (single cpu).

>Plus doesn't this add some excitement to computer chess? Some people want a

Sure it is a great thing that Chrilly is making this card. Just to see how deep
blue would do when a lot of commercial work would have been put into it right
now.

Note that i am sure that Hsu would never have managed to even get close to
brutus standards.

Just compare the design of the move generator of Hsu with chrilly. It is a
*massive* difference already.

>standard platform for competitions, but I think that it was more exciting when
>you had a really wide variety of platforms. Even though I'm not a big fan of
>Chrilly, I have to applaud Chessbase and Chrilly for taking a novel approach.
>(Yeah - they're not the first...)

I do not know exactly what amount of money is paid to chrilly to create it. i
guess they got the cards for free from some sponsor like they get hardware
always for free.

But i am sure that for the realistic chance that brutus wins a world title this
year with a great Kure book and the well known Donninger way to score points,
then a match only and you can sell a lot of cards.

the question is whether those cards will be the same level like brutus.

I bet hundreds of points weaker!

>-K



This page took 0.05 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.