Author: Robert Hyatt
Date: 08:54:48 06/25/98
Go up one level in this thread
On June 25, 1998 at 06:27:58, Vincent Diepeveen wrote:
>
>On June 24, 1998 at 17:46:39, Robert Hyatt wrote:
>
>>On June 24, 1998 at 16:50:10, Vincent Diepeveen wrote:
>>
>>>
>>>On June 24, 1998 at 13:49:04, Robert Hyatt wrote:
>>>
>>>>On June 24, 1998 at 13:33:41, Keith Ian Price wrote:
>>>>
>>>>>On June 24, 1998 at 09:07:50, Ernst A. Heinz wrote:
>>>>>
>>>>>> Hot Chips 10 Advance Program
>>>>>>
>>>>>> August 16-18, 1998
>>>>>> Memorial Auditorium, Stanford University
>>>>>> Palo Alto, California
>>>>>>
>>>>>>[...]
>>>>>>
>>>>>>2:30-4:00 Session 4:Specialized Chips Alan Smith ,chair
>>>>>>
>>>>>>Designing a Single Chip Chess Grandmaster While Knowing Nothing about Chess
>>>>>> Feng-hsiung Hsu, IBM T. J. Watson Research Center
>>>>>>
>>>>>>[...]
>>>>>>
>>>>>>Any computer-chess enthusiasts from the US going there ... ?
>>>>>>
>>>>>>=Ernst=
>>>>>
>>>>>
>>>>>I wonder if it will just be his standard 1.5 hour talk with a new title for the
>>>>>theme of the symposium. I could get down there relatively easily, if I thought
>>>>>he could say anything in depth in 1.5 hours. I would guess it is just the same
>>>>>presentation, though.
>>>>>
>>>>>kp
>>>>
>>>>
>>>>
>>>>I'd suspect it is different, based on the conference. But it also might not
>>>>have much of the data we'd like to see. IE I suspect that it will be less about
>>>>chess, and more about how the hardware was designed to do certain time-critical
>>>>functions efficiently, and how architectural problems were addressed over the
>>>>evolution from chiptest to deep blue II.
>>>>
>>>>Would still be interesting to hear, but more from a hardware perspective, based
>>>>on the "hot chips" title...
>>>
>>>
>>>
>>>Few newbie questions for Hsu, maybe someone can ask them or some of
>>>them; first attempt to make a question list for Hsu:
>>>
>>> -how big is the hashtable on that processor, or didn't he implement it at
>>> all at it?
>>>
>>
>>Not sure... each group of chess processors has a shared hash, but each
>>group can't see the hash from any other group. No way to do a 256-port
>>shared memory...
>>
>>
>>> -why doing a fixed depth at a processor, sounds very stupid to me
>>
>>they don't... they just do the *last* 4 plies of the search, plus the
>>quiescence, in their hardware. This is dictated not by the speed of the
>>chess processors by them selves, but how quickly the IBM SP2 front-end
>>can feed them positions to search. If you reduce this depth to 3 plies,
>>the chess processors outrun the SP2 processors and have to wait. If you
>>increase the depth to 5 plies, the SP2 processors overrun the chess
>>processors and they have to wait. It's a balancing act.
>>
>>
>>>
>>> -how fast is communications with his design with the mainframe/supercomputer
>>
>>hardware cycle time. *very* fast, just like shared memory.
>
>So Hsu needs to clear this up himselve.
>
I don't know what you mean. The SP uses the usual "memory-mapped" I/O
facility. To give something to a chess processor, you simply store that
"something" in the right memory address and the chess processor instantly
has it. As I said, "just like shared memory" so we are talking nanoseconds
to send something to a chess processor, actually somewhat faster than a PC
can store something in real memory.
>>>
>>> -how many cycles takes an evaluation at his hardware single chip?
>>
>>the chess processors can run at 24mhz, and search 2.4M nodes per second each.
>>SO that factors into 10 clock cycles to do *everything* from generating moves,
>>making moves, handling alpha/beta, doing the fast and slow hardware evals
>>and so forth... all in 10 cycles.
>
>This means they can't have much general
>rules cannot be done in parallel, but after the knowledge.
again, your lack of understanding doesn't necessarily translate into
something *they* can't do.
>
>So they lack masses of general rules if you do it in 10 clocks an eval.
>Because you *CANNOT* design them parallel.
certainly you can. You simply need to read Ken Thompson's espose' on how
he designed belle. As soon as you update the board, with the MakeMove()
facility, the eval starts *in parallel*. By the time you need it, it is
ready. 10 clocks is enough time to compute a "tree" of eval terms 512
deep and collapse them into one value via an adder tree. I don't have
512 things that I can calculate in parallel, then take those in pairs
and massage them into 256 values, then take those in pairs and massage
them into 128 values, and so forth. Notice that each of these 512 terms
could have dozens of positional patterns that are recognized and massaged.
then the 512 "chunks" can interact with and influence any of the other 512
chunks in any way you'd care to design. Which I'll bet is 100x more
complex than anything you do, or will do, in your program. You try to
impose your lack of hardware experience, which leads you to define limits
that don't exist, into what Hsu and company can do. They've only been
doing hardware design for 15 years now. I suspect they have a clue about
what they can and can't do, wouldn't you think?
You need to study hardware architecture and design before you make such
"sweeping" statements. It's not difficult at all. In each of those
512 "chunks" they could evaluate dozens of different things, all in
parallel, in 512 parallel "pipes". Then they start using these "endpoints"
in any way they want such as using a "weak pawn" endpoing and letting it
interact with any of the other endpoints. If you can do it in software,
I can do it in hardware, particularly easy when using a "silicon compiler"
and using on-chip RAM to hold positional weights so the hardware isn't too
"fixed". They can even modify the "patterns" they evaluate using this same
RAM trick..
>
>You need *RESULTS* from other parts of your evaluation first
see above. Trivial to do.
>
>I admit that i don't have too many general rules yet, but it's steadily
>growing.
>
>>> -what micron technology, 0.60?
>>> -what megaherz speed runs CPU at?
>
>>chess processors at 24mhz. Not sure about the SP2 they used, maybe
>>300mhz per processor.
>
>I'm interested in micron technology, because we know how expensive
>the hardware was that whole project. The SP2 is far from interesting,
>because it doesn't do the 4 leaf plies, which are the most important.
>
>Note that i would be happy with the sp2 myself. I get that 14 ply anyway
>on such a machine no need for special hardware processors.
you are very naive' too. whatever you would get, they would get 10,000
times as much *with* their hardware... you simply keep overlooking the
obvious.
>
>>>
>>> -how to design on CPU's knowledge depending things which depend on
>>> other knowledge in a smart and lossless way speed it up?
>>
>>
>>cannot parse the above.. :)
>
>Yes therefore i ask it to Hsu.
now that I understand what you mean, I answered that above. There is
*nothing* that prevents inter-related knowledge terms in hardware, any
more than there is anything that prevents them in software. Here's the
tenet of a hardware person:
"*anything* you can do in software, I can do in hardware, and I can
do it at least 100X faster, because I am *not* going to be wasting
time fetching/decoding instructions."
>
>>>
>>> -how many transistors on CPU (or does this sound cruel?)?
>>
>>
>>I saw this number, but can't recall for the life of me. Enough that
>>they actually put some 3 piece endgame databases right on the chip to
>>finish filling it out...
>
>Numbers, no vague memories needed.
someone posted the numbers here, although I see no reason why that is
important at all.
>
>>> -how many dollars is pressing one CPU when pressing say 10,000?
>>
>>not super expensive, although I have not seen a price quote. They
>>used project "MOSIS" for the fabrication work, and a silicon compiler
>>to design the thing... That's all I recall...
>
>Prices. If you write down the prize people might be asking: why don't press
>few and put them at a cheap PCI/AGP card which is ready to use.
>
>Let me guess 1 dollar each processor?
>
Vincent, please, please get off your "I know all about this stuff and
I know it can't be done." I'm reminded of the college graduate that
was talking to his best friend:
"when I graduated from high school, my dad was about the stupidest,
most conservative person I ever knew. You know, it is *amazing* how
much *he* learned while *I* was in college."
the hardware cost is but one part of *any* chip's price. You also factor
in the research and development expenses, and that is *always* the largest
chunk of the cost. *by* far. So your question about "how much to press
a chip" makes no sense... they have 10 years of time invested in that
chip. I'd assume they would want to get a little of that back.
>>>Greetings,
>>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.