Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Challenge to show the audience an DB example

Author: Robert Hyatt

Date: 07:41:24 06/27/98

Go up one level in this thread


On June 27, 1998 at 09:53:00, Vincent Diepeveen wrote:

>
>On June 26, 1998 at 17:19:51, Robert Hyatt wrote:
>
>>On June 26, 1998 at 05:52:15, Vincent Diepeveen wrote:
>>
>>>
>>>On June 26, 1998 at 05:08:22, Roberto Waldteufel wrote:
>>>
>>>>
>>>>On June 25, 1998 at 20:52:40, Vincent Diepeveen wrote:
>>>>
>>>>>
>>>>>On June 25, 1998 at 12:15:09, Robert Hyatt wrote:
>>>>>
>>>>>>On June 25, 1998 at 08:07:53, Roberto Waldteufel wrote:
>>>>>>
>>>>>>>
>>>>>>>On June 25, 1998 at 06:35:40, Vincent Diepeveen wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>On June 25, 1998 at 04:54:02, Roberto Waldteufel wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>Here's another question about Hsu's chess chip. I seem to recall reading some
>>>>>>>>>time ago that Hsu was considering a commercial release of his chip. Does anyone
>>>>>>>>>know anything more about this? If the chip were to become available, how could I
>>>>>>>>>use it in conjunction with a PC? would the fixed depth not be "out of sync" for
>>>>>>>>>the speed of, eg a Pentium 333Mhz if it was designed to work with a
>>>>>>>>>supercomputer, or can the fixed depth be adjusted to redo the balancing act in
>>>>>>>>>the new environment? If it were possible, I would be very interested in
>>>>>>>>>experimenting with this sort of hardware coupling. I assume that it would extend
>>>>>>>>>the depth to which a program could search by something like 4 extra plies within
>>>>>>>>>the same time. This would surely improve the strength of the PC ches programs
>>>>>>>>>quite a lot!
>>>>>>>>>
>>>>>>>>>Roberto
>>>>>>>>
>>>>>>>>You see it wrong. If you do 4 ply searches without hash etc, then
>>>>>>>>2.5 million drops quickly to say 300k nodes a second.
>>>>>>>>
>>>>>>>>So in fact you're playing against a kind of fritz5, which DOES search
>>>>>>>>all leafs fullwidth, which gives you some extra tactics, so commercial against
>>>>>>>>programs which are only tested at the same hardware and are only
>>>>>>>>busy with outbooking and trying to finish the game by means of tactics,
>>>>>>>>you beat with big numbers then, but it will play horrible.
>>>>>>>>
>>>>>>>>Vincent
>>>>>>>
>>>>>>>Hi Vincent,
>>>>>>>
>>>>>>>I'm not sure I understand - can you explain what yo mean. I do use hash tables -
>>>>>>>would not the Chip have access to my system's RAM? And When you say it would
>>>>>>>play horribly, why so? how could the chip make it play any worse than without
>>>>>>>the chip?
>>>>>>>
>>>>>>>Best wishes,
>>>>>>>
>>>>>>>Roberto
>>>>>>
>>>>>>
>>>>>>you have to ignore part of what vincent writes, because he is an "anti-deepblue"
>>>>>>person from way back.  A couple of key points:
>>>>>>
>>>>>>1.  no a chess processor could not see your RAM.  But they do have their own
>>>>>>hash table memory.
>>>>>>
>>>>>>2.  a PC program with one of these chips attached would be far and away stronger
>>>>>>than any existing computer chess program running on a PC.  A chess board that
>>>>>>they used had 8 processors n it.  A PC could use the same approach, and search
>>>>>>at about 20 million nodes per second.
>>>>>>
>>>>>>I pointed out to Vincent last week here that he was not doing well playing
>>>>>>against Crafty.  He pointed out (factually, by the way) that Crafty has a
>>>>>>hardware advantage with the 4 processor ALR.  But he's going to have to take
>>>>>>a stand on one side of the fence or the other.  Either speed is important, and
>>>>>>his note about me having faster hardware does explain why crafty wins most of
>>>>>>the games played on ICC, or speed is not important and doesn't explain crafty's
>>>>>>win/lose ratio vs the various DiepX programs on ICC.
>>>>>>
>>>>>>I believe in a combination of speed and intelligent.  DB has that.  A PC-card
>>>>>>with one or more DB processors would *still* have that.  And it would produce
>>>>>>unheard-of performance figures for a PC-based chess program.  Ala' Deep Blue
>>>>>>Junior, which is basically a PC with a chess board (actually an IBM single-cpu
>>>>>>workstation with a VME-bus chess card, so don't start hoping to buy one of those
>>>>>>and plug it into your PCI bus.  :)
>>>>>>
>>>>>>Bob
>>>>>
>>>>>You have to ignore most of what Bob writes about Deep Blue, as he is a
>>>>>known pro-deep-blue authority.
>>>>
>>>>
>>>>If Bob is an authority on DB, then why should I ignore him? I have always found
>>>>Bob to be a font of knowlege in respect of things to do with computer chess. He
>>>>has been in conversation with the DB people, he has written his own programs
>>>>(one of which I believe won the World Computer Chess Championships - can't
>>>>remember which year though), and I have found his suggestions of great help in
>>>>the past.
>>>>
>>>>
>>>>>
>>>>>As i pointed out a 'smart' program like deep blue can never do much
>>>>>knowledge within 10 clocks. This means that total depth of everything
>>>>>is 10 clocks.
>>>>>
>>>>
>>>>Now if I understand Bob correctly, all the positional scoring stuff is done in
>>>>parallel as they did in Belle, so it seems quite possible to me that it's all
>>>>done in as little as about 10 clocks of a PC's CPU.
>>>
>>>you can *NOT* do everything in parallel.
>>>
>>>Within evaluation you get *parallel* results, which you put into new
>>>functions (taking again clocks), which afer a lot of *conditions*
>>>get into new results.
>>>
>>>if pattern
>>>   then evaluate
>>>
>>>But first you need the compare values for the pattern,
>>>which are from previous functions
>>>
>>
>>
>>Vincent, the depth of your ignorance (in some subjects like hardware design)
>>are only exceeded by the level of your self-assurance that you know everything
>>there is to know...  Here you don't.
>>
>>Here's a basic hardware design lecture:
>>
>>DB is running each chess processor at 24mhz.  Doesn't that seem a little slow
>>by current standards?  Here's why:  I'm going to round this up to 25mhz to make
>>the numbers easier.  25mhz means 40 nanoseconds per clock cycle.  Todays gate
>>delay speeds can be quite easily taken to 100 picoseconds... which means that
>>one clock cycle on the chess processor can withstand 250 gate delays.  That
>>means I can take any combinational logic circuit I want, whose tree is 250
>>deep or less, and execute that in one clock cycle, because all gate delays
>>will add together and still settle in under one clock cycle.  So I can do
>>up to 2^250 discrete ands, ors, etc and get away with them in *one* cycle.
>>After that one cycle finishes, which can compute up to 512 (total) different
>>things in parallel, I can use the next clock cycle, with another 250 gate
>>delays, to mix and match, shift, multiply, compare, sum, rotate, and whatever
>>else I want to do each of those first-order terms and produce 256 second-order
>>terms.  I still have eight clock cycles to go.  I can now take any combinations
>>of those 256 second-order terms, do whatever I want to them, and reduce that to
>>128 third-order terms, in one clock cycle.  Ditto for the next 7 to take this
>>finally to a single evaluation term.  I don't do anything that complicated in
>>Crafty, nor do you in DiepX.  Because you can't and search more than 10 nodes
>>per second.
>
>We don't doubt your historic insight Bob,
>we discussed some weeks ago on the internet something:
>
>I very happily told you that i finally got my pawn majority evaluation
>to work way better. about 500 lines of general C code now.
>
>You said to me you had experimented with some majority code too in the
>past, but found it unnessecery to include in crafty.
>

No I didn't say that.  I said that I had not yet got it integrated well
enough to make it a part of the released version.  The problem is that I
already have code to handle "distant passed pawns."  Adding "distant
majority" code interacts with this...  I don't want it to end up with
a majority and not want to liquidate it to produce a distant passed pawn,
nor do I want it to do it too soon.  That's the *only* reason it is not
"in" yet..  it takes time to adjust it so that it works with existing code
and doesn't produce bizarre or premature moves.





>My pawn majority code includes all kind of loops to prevent for example
>that white c3,d3,e4 versus b5,c5,d5 gets seen like a majority, where
>c3,d3,e4 versus e3 e5 (optionally pawn at f6 or something)
>and b6 is a clear majority (leaving out the
>freepawn at e3.
>
>Also a2,b2 versus a7 is a majority, where a2,b2 versus a7,b7 is
>clearly not.
>
>Now i suppose (knowing that your rating in the past was 2200+)
>that we were talking about recognising the same.

yes... same idea...


>
>Pleast post that code, and then throw it into your hardware compiler,
>and give us the NAND and NOR circuit needed.

that is quite trivial.  Remember that the cray has hardware to do floating
point multiplies, which takes *many* of *your* cycles, but doesn't take
many of "Seymour's" cycles...  because he's a hardware engineer...



>
>Then the audience who might read it will see how many gates
>this code will already take.


this is not difficult to recognize.  When I am finished with mine, you
can look at how I did it, and you'll see it is *not* 500 lines of C when
*I* write it using bitmaps.  It is quite small at present, because I let
other existing code pick out things that this code then uses quite quickly
to detect a majority..


>
>This is the challenge, and you are the one to show how to do it.
>
>No theoretical bullshit, it's clear that all evidence shows how little
>knowledge DB has, now it's time to show the audience why it's
>so hard for low rated people to program chessknowledge.
>
>I doubt that Hsu has ever heart of the word 'pawn majority'.
>I dropped some day this word among some chessprogrammers,
>and they all didn't even know what the word means. Because being
>the programmer he's the one who needs to implement so he needs to
>be the one that must exactly understand what it is and what it is not.


"Has never heard of pawn majority" after having multiple GM players
work with them to make it better?  You really believe that?  Then I have
this Bridge in New York that I'd like to get rid of.  It's been in my
family for years, but I'm willing to sell it to you cheap.  Very cheap.
I'm sure you'll buy that too...



>
>I have given you some easy examples, now show us your pseudo code
>which detects this (bitboards aren't requested, but knowing your
>enthousiasm for it you may use them), then also translate them
>to NOR's and NANDS (do it automatically),
>
>Then we will see how many gates the compiler needs for just what
>i call 1 step of knowledge in my program, where other parts
>go think about.
>

what's with the "how many gates"?  When we can put millions on a chip
now, this is meaningless.  No one counts them any longer, which is why
we have FP instructions like SQRT, SIN, COS, TAN on the x86 FPU.  ANd
why RISC machines now have full instruction sets rather than leaving out
integer mult/div like the first SPARC did.


>BTW you may take more gates than your 250.


Vincent, again you are screwing up.  I was talking about a *TREE* of
gates.  With a *DEPTH* of 250 gates front to back, because each gate
has a delay of 100 picoseconds (in my example) and All the gates have
to settle, drive their successor, it has to settls... and I have to do
that within 40 nanoseconds to run at 25mhz.  But this is a *TREE*... get
it?  A *tree*.  How many gates can I use, with one at the top, if this
is a *binary* tree with a depth of 250?  Can you compute 2^250?  I can't
put that on a chip today, but if I could, I could get the result out in
40 nanoseconds, with no problem.

So please pay attention.  When you talk about "gate delay" you are
talking about the longest path thru the logic, where gate a feeds gate
b which feeds gate C and so forth.

When you (one day) understand hardware design, we won't have to have this
elementary discussion any longer.  Just think "depth of a binary tree" and
*not* "total gates inside the tree".  Just like a chess program, in essence.

Imagine a binary tree with a depth of 250.  Since all the gates operate
in parallel, it's only a question of how soon the gates at the bottom can
"settle" and propogate info to the next level up, and then they have to settle,
and propogate, and so forth.  "gate delay" refers to the time interval from
when the inputs become stable,until the outputs become stable.  If you put
two such gates end-to-end, the delay is 2x.  If they are side-by-side and
they feed a single gate above them, you have 3 gates, but 2 gate delays,
because those to side-by-side gates settle in parallel, then the gate above
them takes the two stabilized outputs and then takes a gate delay itself ot
do its thing.

Do you get it now?  And can we move on to something that is new and
interesting?  Not old, well-known hardware design principles?



>
>Because i Marcel van Kervinck who calculates better than we both
>says 40ns = 400 gates x 100 picoseconds.

yes...  but he is talking about the depth of the tree from top to
bottom = 400, *not* the total number of gates...  otherwise the pentium
would be running at a millisecond clock cycle...



>
>So i give you for free 400 gates a clock.
>No doubt you can do a lot with 400 gates, as it is hardware,
>the challenge is clear.

I'm not going to use 400 gates.   I'm going to use 2^400 gates, and I'm
going to smoke your program so badly, you may well never see move 20 in
a game.. :)

And I'm going to do this because Iknow how to design hardware, while you
don't have a clue.

>
>The audience will finally see how big the difference 'human intelligence'
>and computer intelligence is. If you are prepared to take part in it.
>Should not be difficult. You have the code in your attic, hardware
>compilers are in your opinion very good, so we will wait for the results.
>
>Alea acta est.


I can do this.. But hopefully, by now, you see that your basic understanding
of the problem is totally flawed and incorrect.  Big difference between
400 gates and 2^400 gates...  *BIG BIG BIG* difference.  Go read a good
architecture book.  Or a good "designing VLSI circuits" book.  Then you won't
put your foot in your mouth like this.  It's a mistake even a freshman in
EE wouldn't make.



>
>>>After all this you get an evaluation which you have to *wait* for
>>>before you can use them in your search.
>
>>*not* if I do it in parallel, and overlap the search... and start the
>>eval for the next ply right after the makemove for *this* ply.  That's
>>what you don't (or won't) understand.  You are thinking serially.  At this
>>rate, parallel Diep has no chance to succeed, until you modify your thinking.
>
>I think high level, it's indeed way easier to think serially. To quote some
>hardware designers: "the compiler makes it parallel".

so does the hardware.  check out the P6/PII...



>
>However you give yourselve hints to this compiler which makes you again
>a low level designer, where the ultimate goal of course for us is to
>design high level, instead of knowing what a NAND or a NOR is.
>
>It surprises me that someone that is teaching others still thinks so at such
>a low level.

I think at such a low level, when the subject is "designing hardware to
play chess, like DB as an example." Because if you don't understand the
"low level" stuff, you make stupid comments that are not just wrong, but
wrong by such a margin of error (hint, compare 400 to 2^400 and then come
back to the discussion with a little better preparation).


>
>>>So yes a lot goes parallel, but you cannot do *everything* parallel,
>>>because things are depending upon each other.
>>>
>>>>>Also first Bob wrote that Deep Blue had 1000 adjustable parameters.
>>>>>As i pointed out 1000 adjustable parameters are hardly more than
>>>>>piece square tables (already taking 12*64=768 adjustable parameters).
>>>
>>>>So you say that full piece-square tables plus about another 232 parameters is
>>>>not enough? Sounds like plenty to me.
>>>
>>>That's not much, especially if you give for example freepawns at different
>>>squares different bonuses. That already takes care for 64 values.
>>>
>>>Further later was claimed that the amount was 6000, which is impossible
>>>to do in 10 clocks.
>>
>>
>>only for those with no background in circuit design.  You'd be amazed what  the
>>pentium Pro processor is doing in *one* clock cycle *internally*.  And I'm not
>>talking about the 200mhz clock... I'm talking at gate-delay levels...
>>
>>
>>>
>>>If i parallellize only evaluation (not taking care for the search where you
>>>lose clocks too), then i NEVER can do it in parallel in 10 clocks.
>>
>>
>>and just because *you* can't do it in 10 means I can't?  Seems like sound logic
>>to me...
>>
>>
>>>
>>>No way. I will not make it within 100 clocks even.
>>>
>>>Too many things depending upon each other.
>>>
>>>>>So when smoke about knowledge has been removed, we can go to
>>>>>the supposed speed of it.
>>>
>>>>>Then it appears that from the few printouts i saw of Deep Blue,
>>>>>that it just searched 11 or 12 ply.
>>>
>>>>>That's not much for a 200 million nodes a second program.
>>>>>
>>>>
>>>>Well, I don't know how deep you would expect it to search, but if I could get a
>>>>full-width 12-ply search with quiescence searching on top in middlegame
>>>>positions, I would not grumble!
>>>
>>>Buy a PII-300 and see whether you get 11/12 ply in middlegame,
>>>then look to the 200M nodes DB needs for the same.
>>
>>
>>Again... if DB used null-move R=2, with a search like mine, they would do 20+
>>plies in middlegame.  They do extensions instead.  I do 12 plies at 200K nodes
>>per second.  They are 1,000 times faster... with a branching factor of roughly
>>2.5, what is log base 2.5 of 1000?  That's my search depth at that speed.  So
>>*obviously* they are doing something else.  You compare your 12 plies to their
>>12 plies.  I say that's total hogwash.
>>
>>
>>>
>>>>>Now when we consider that it has MASSES of disadvantages a normal
>>>>>chessprogram has at a general purpose processor, not to
>>>>>confuse with single chips, then we see after some calculations that
>>>>>those processors indeed are fast, but their practical speed compared to
>>>>>PC programs is a lot less.
>>>>>
>>>>>Still more than PC programs of nowadays, so no doubt it'll kick some butt
>>>>>in blitz, but i doubt whether normal users who likes to see some positional
>>>>>insight
>>>>>of a program as well will ever be happy with it.
>>>>>
>>>>
>>>>I just want my program to play the strongest possible game of chess I am capable
>>>>of achieving. Whether the program plays in a positional or tactical style I
>>>>don't mind, as long as it wins. I have found that it plays best in open,
>>>>tactical positions, or in positional struggles for controll of open files and
>>>>central outposts, but not so well in blocked positions. I think most users
>>>>prefer a tactical program, although my own chess style as a player is more
>>>>stodgy than my program's style. But then this brings into question what exactly
>>>>constitutes a *normal* user.
>>>
>>>Well what refrains you from developing another pawn grabbing program,
>>>but be aware when 2 programs which just capture pawns play each other,
>>>then the deeper searching will always win,
>>>
>>>Because your knowledge is a subset of the other program which when
>>>searching as deep or deeper will always see the same or more.
>>>
>>>That's why some see DB as *unbeatable*.
>>>I don't see it that way.
>>>
>>>"Tactics is a very important positional aspect of chess"
>>>
>>>>>Further i like to point at the fact that the technology used for DB
>>>>>chessprocessors is very cheap.
>>>
>>>>Great! Where can I buy it?
>>>
>>>Meaning to say if IBM would allow the DB team to make a PC version
>>>this would be not so expensive.
>>>
>>>>>The salary of the PR people who were
>>>>>needed to organize the event needed probably eated up the biggest part
>>>>>of all that money, which IBM claims that the event Deep Blue-Kasparov took.
>>>
>>>>>Greetings,
>>>>>Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.