Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How Flexible opinions are: DB and further

Author: Robert Hyatt

Date: 07:55:56 06/27/98

Go up one level in this thread


On June 27, 1998 at 07:00:13, Vincent Diepeveen wrote:

>
>On June 26, 1998 at 17:19:51, Robert Hyatt wrote:
>
>>On June 26, 1998 at 05:52:15, Vincent Diepeveen wrote:
>
>>>>>>>>You see it wrong. If you do 4 ply searches without hash etc, then
>>>>>>>>2.5 million drops quickly to say 300k nodes a second.
>>>>>>>>
>>>>>>>>So in fact you're playing against a kind of fritz5, which DOES search
>>>>>>>>all leafs fullwidth, which gives you some extra tactics, so commercial against
>>>>>>>>programs which are only tested at the same hardware and are only
>>>>>>>>busy with outbooking and trying to finish the game by means of tactics,
>>>>>>>>you beat with big numbers then, but it will play horrible.
>>>>>>>>
>>>>>>>>Vincent
>>>>>>>
>>>>>>>Hi Vincent,
>>>>>>>
>>>>>>>I'm not sure I understand - can you explain what yo mean. I do use hash tables -
>>>>>>>would not the Chip have access to my system's RAM? And When you say it would
>>>>>>>play horribly, why so? how could the chip make it play any worse than without
>>>>>>>the chip?
>>>>>>>
>>>>>>>Best wishes,
>>>>>>>
>>>>>>>Roberto
>>>>>>
>>>>>>
>>>>>>you have to ignore part of what vincent writes, because he is an "anti-deepblue"
>>>>>>person from way back.  A couple of key points:
>>>>>>
>>>>>>1.  no a chess processor could not see your RAM.  But they do have their own
>>>>>>hash table memory.
>>>>>>
>>>>>>2.  a PC program with one of these chips attached would be far and away stronger
>>>>>>than any existing computer chess program running on a PC.  A chess board that
>>>>>>they used had 8 processors n it.  A PC could use the same approach, and search
>>>>>>at about 20 million nodes per second.
>
>>>>>You have to ignore most of what Bob writes about Deep Blue, as he is a
>>>>>known pro-deep-blue authority.
>
>>>>
>>>>If Bob is an authority on DB, then why should I ignore him? I have always found
>>>>Bob to be a font of knowlege in respect of things to do with computer chess.
>So why may he say: ignore Vincent, but may i not say ignore Bob?
>
>If IBM sales go down i'll advice them to hire bob, scientific, patience,
>and a clear opinion.
>
>>>>>As i pointed out a 'smart' program like deep blue can never do much
>>>>>knowledge within 10 clocks. This means that total depth of everything
>>>>>is 10 clocks.
>
>I see that normal evaluation of DB is 8 clocks now and
>only done 20% (!) of the cases.
>
>So they're kind of lazy evaluating. Didn't know it could be done in 80% of
>the times, meaning that their window to lazy evaluate is quite small, meaning
>that the *reach* of the evaluation is not big.
>

doesn't mean that at all.  You seem to be able to measure the mass of a
rock, and compute the total number of atoms in the universe, without having
a clue about anything else.



>The more knowledge is in your evaluation, the bigger the terms can differ,
>the wider the window that comes out of evaluation, the less you can
>lazy evaluate. My window out of evaluation is usually around [-12 pawns;12pawns]
>that's positional score, where usual the black score
>compensates for say 11 pawns the 11 pawns of white (getting an evaluation
>of 0 then), but sometimes this isn't the case, causing huge window
>differences.
>
>Hyatt your turn. Better cut and never paste this, he he. this is the x-th
>hint to that evaluation is not having that a depth.


nope... it only shows ignorance of hardware.  And here's a hint:  the
ignorance is *not* on my part.

I see *nothing* that says an evaluation has to produce evaluations that
are +/- 12.  But you need to read about Belle and the fast/slow evaluations
before you write more.. then you'll understand what this is all about.


>
>Can some known lazy evaluators say what % of nodes they can do
>lazy, and what window is needed to get 80%?
>
>>>>Now if I understand Bob correctly, all the positional scoring stuff is done in
>>>>parallel as they did in Belle, so it seems quite possible to me that it's all
>>>>done in as little as about 10 clocks of a PC's CPU.
>
>>>you can *NOT* do everything in parallel.
>
>>>Within evaluation you get *parallel* results, which you put into new
>>>functions (taking again clocks), which afer a lot of *conditions*
>>>get into new results.
>
>>>if pattern
>>>   then evaluate
>>>
>>>But first you need the compare values for the pattern,
>>>which are from previous functions
>>>
>>
>>
>>Vincent, the depth of your ignorance (in some subjects like hardware design)
>>are only exceeded by the level of your self-assurance that you know everything
>>there is to know...  Here you don't.
>>
>>Here's a basic hardware design lecture:
>>
>>DB is running each chess processor at 24mhz.  Doesn't that seem a little slow
>>by current standards?  Here's why:  I'm going to round this up to 25mhz to make
>>the numbers easier.  25mhz means 40 nanoseconds per clock cycle.  Todays gate
>>delay speeds can be quite easily taken to 100 picoseconds... which means that
>>one clock cycle on the chess processor can withstand 250 gate delays.  That
>>means I can take any combinational logic circuit I want, whose tree is 250
>>deep or less, and execute that in one clock cycle, because all gate delays
>>will add together and still settle in under one clock cycle.  So I can do
>>up to 2^250 discrete ands, ors, etc and get away with them in *one* cycle.
>>After that one cycle finishes, which can compute up to 512 (total) different
>>things in parallel, I can use the next clock cycle, with another 250 gate
>>delays, to mix and match, shift, multiply, compare, sum, rotate, and whatever
>>else I want to do each of those first-order terms and produce 256 second-order
>>terms.  I still have eight clock cycles to go.  I can now take any combinations
>>of those 256 second-order terms, do whatever I want to them, and reduce that to
>>128 third-order terms, in one clock cycle.  Ditto for the next 7 to take this
>>finally to a single evaluation term.  I don't do anything that complicated in
>>Crafty, nor do you in DiepX.  Because you can't and search more than 10 nodes
>>per second.
>
>So this means clearly that every bit of knowledge of them is kind of
>'assembler' a like. Meaning that complex patterns can only be done in
>1 clock cycle. So the next complex pattern they have (not even
>counting loops, as they can do them with 64 parallel) must wait.


wait for *one* gate delay, or 100 picoseconds.  Not a *long* wait.  And
definitely not a *one major clock cycle* delay as you so incorrectly want
to assume here...


>
>This because a pattern is depending on a number of factors,
>no doubt they can do 1 pattern in 1 clockcycle, yet the more
>'&&' are needed the tougher it gets for them to combine it with the
>next pattern.


they can do 1 pattern in a few gate delays.  They can do N patterns
in that same number of gate delays.  Then take a few more to let those
patterns interact with each other.. and so forth...


>
>As we know it's when designing is way easier to do the next step simply
>in the next clockcycle, so we now know that it's maximum 8 clocks,
>which already gives its limited depth.

yes.. 8 clock cycles = 3,200 gate delays, which gives them the ability
to do 2^3200 things...  can you compute that number?  impossible to build
such hardware, of course...  but the limit they have, and the limit you
imagine are so far apart as to be not comparable by sane folks...


>
>Now we look how 'plausible' it is that they can do something in 256
>micro-ops (your 256 gate delay). This is rather surprising, but
>i found in my pawn structure code an AND conditions that was very
>long and needed all kinds of array lookup.
>

using gates, you dont do "lookups" necessarily.  You just compare
against each known pattern in parallel, wasting transistors but incurring
100picosecond delays, not memory speed delays...


>Now i don't know how many 'gates' an array lookup needs, but
>supposing it takes several then this 256 will be sufficient for the major
>part but will get tough for a few.


don't use array lookups necessarily.  IF so, put 'em in static rom right
in the circuit and you can access the patterns in a couple of gate delay
cycles..


>
>The evaluation speed of Diep differs strongly btw. It depends heavily
>on the position. Endgame (where i did not that much on yet) is way
>faster than complex middlegame).

doesn't matter.  They evaluate *everything* because it is done in parallel,
then they "mask out" the stuff that doesn't apply.  It's faster and simpler.



>
>I doubt your 10 node speed. That's of course laughable. Simply don't
>use bitboards and you never will get back to 10 nodes a second
>at a pentium pro.
>
>Because we have 200,000,000 million clocks to divide from which 2
>can get executed at the same time, and the biggest delay is the branch
>prediction. So with 1500 evaluations a second (node rate drops to
>7000 in complex middlegames at my pro, the more patterns are true
>in a position the slower it gets of course)
>and half of them get out of hash (i have several hashtables
>among which a table which just stores the evaluation of a position,
>and i get around 50-60% out of it)
>
>so 200,000,000 / 1500 = an amazing say around 140,000 clocks an
>evaluation.
>
>Now back to your 1000 easy rules that are in DB with 10 nodes
>a second,
>200,000,000 / 1000*10 nodes a second = 20,000 clocks a rule.
>So with your laughable 10 nodes a second it would get 20,000 clocks
>for EACH general pattern!

you keep omitting a zero.  I said 10,000... later Hsu said 8,000 which
was reported by someone in r.g.c.c.. I have used 8,000 ever since to the
best of my knowledge..


>
>Comon this is ridiceoulous, even for bitboards at the Intel PC!
>
>20,000 clocks for EACH general pattern!
>
>And then we even CONSIDER that every node every pattern gets
>matched too, which is of course not the case. Only a subset gets
>matched in a position.
>
>So Pentium Pro perhaps faster than you think!


nope.. you missed the point.  I pointed out that they could do an
evaluation that is 512 levels deep, while *you* can't... not on a
pro or any other machine... They can compute N first order terms, then
N/2 second order terms, then N/4 third order terms...  Do you have terms
that would be called 512'th order?  hardly.  But ifyou did, 10 nodes per
second would be fast...



>
>>So don't tell me what you can't do in parallel.  Maybe *you* don't know how to
>>do it, but *I* do.  I was designing communication hardware when you were just
>>an itch in your daddy's crotch, to quote a famous movie.  And I was doing this
>>same sort of stuff.
>>
>>You obviously have no idea what "clock cycle" means, nor do you have any idea
>>what "gate delays" means, and with that lack of knowledge, it is hopeless to
>>try and explain this to you...  any more than I can explain integral calculus
>>to a 5 year old.
>>
>>
>>
>>
>>>After all this you get an evaluation which you have to *wait* for
>>>before you can use them in your search.


I notice you don't respond when I point out glaring errors in your analysis,
you change the subject to "yet another reason why they can't do this or that
and ..."???


>>
>>
>>*not* if I do it in parallel, and overlap the search... and start the
>>eval for the next ply right after the makemove for *this* ply.  That's
>>what you don't (or won't) understand.  You are thinking serially.  At this
>>rate, parallel Diep has no chance to succeed, until you modify your thinking.
>>
>>
>>>
>>>So yes a lot goes parallel, but you cannot do *everything* parallel,
>>>because things are depending upon each other.
>>>
>>>>>Also first Bob wrote that Deep Blue had 1000 adjustable parameters.
>>>>>As i pointed out 1000 adjustable parameters are hardly more than
>>>>>piece square tables (already taking 12*64=768 adjustable parameters).
>>>
>>>>So you say that full piece-square tables plus about another 232 parameters is
>>>>not enough? Sounds like plenty to me.
>>>
>>>That's not much, especially if you give for example freepawns at different
>>>squares different bonuses. That already takes care for 64 values.
>>>
>>>Further later was claimed that the amount was 6000, which is impossible
>>>to do in 10 clocks.
>>
>>
>>only for those with no background in circuit design.  You'd be amazed what  the
>>pentium Pro processor is doing in *one* clock cycle *internally*.  And I'm not
>>talking about the 200mhz clock... I'm talking at gate-delay levels...
>>
>>
>>>
>>>If i parallellize only evaluation (not taking care for the search where you
>>>lose clocks too), then i NEVER can do it in parallel in 10 clocks.
>>
>>
>>and just because *you* can't do it in 10 means I can't?  Seems like sound logic
>>to me...
>>
>>
>>>
>>>No way. I will not make it within 100 clocks even.
>>>
>>>Too many things depending upon each other.
>>>
>>>>>So when smoke about knowledge has been removed, we can go to
>>>>>the supposed speed of it.
>>>
>>>>>Then it appears that from the few printouts i saw of Deep Blue,
>>>>>that it just searched 11 or 12 ply.
>>>
>>>>>That's not much for a 200 million nodes a second program.
>>>>>
>>>>
>>>>Well, I don't know how deep you would expect it to search, but if I could get a
>>>>full-width 12-ply search with quiescence searching on top in middlegame
>>>>positions, I would not grumble!
>>>
>>>Buy a PII-300 and see whether you get 11/12 ply in middlegame,
>>>then look to the 200M nodes DB needs for the same.
>>
>>
>>Again... if DB used null-move R=2, with a search like mine, they would do 20+
>>plies in middlegame.  They do extensions instead.  I do 12 plies at 200K nodes
>>per second.  They are 1,000 times faster... with a branching factor of roughly
>>2.5, what is log base 2.5 of 1000?  That's my search depth at that speed.  So
>>*obviously* they are doing something else.  You compare your 12 plies to their
>>12 plies.  I say that's total hogwash.
>>
>>
>>>
>>>>>Now when we consider that it has MASSES of disadvantages a normal
>>>>>chessprogram has at a general purpose processor, not to
>>>>>confuse with single chips, then we see after some calculations that
>>>>>those processors indeed are fast, but their practical speed compared to
>>>>>PC programs is a lot less.
>>>>>
>>>>>Still more than PC programs of nowadays, so no doubt it'll kick some butt
>>>>>in blitz, but i doubt whether normal users who likes to see some positional
>>>>>insight
>>>>>of a program as well will ever be happy with it.
>>>>>
>>>>
>>>>I just want my program to play the strongest possible game of chess I am capable
>>>>of achieving. Whether the program plays in a positional or tactical style I
>>>>don't mind, as long as it wins. I have found that it plays best in open,
>>>>tactical positions, or in positional struggles for controll of open files and
>>>>central outposts, but not so well in blocked positions. I think most users
>>>>prefer a tactical program, although my own chess style as a player is more
>>>>stodgy than my program's style. But then this brings into question what exactly
>>>>constitutes a *normal* user.
>>>
>>>Well what refrains you from developing another pawn grabbing program,
>>>but be aware when 2 programs which just capture pawns play each other,
>>>then the deeper searching will always win,
>>>
>>>Because your knowledge is a subset of the other program which when
>>>searching as deep or deeper will always see the same or more.
>>>
>>>That's why some see DB as *unbeatable*.
>>>I don't see it that way.
>>>
>>>"Tactics is a very important positional aspect of chess"
>>>
>>>>>Further i like to point at the fact that the technology used for DB
>>>>>chessprocessors is very cheap.
>>>
>>>>Great! Where can I buy it?
>>>
>>>Meaning to say if IBM would allow the DB team to make a PC version
>>>this would be not so expensive.
>>>
>>>>>The salary of the PR people who were
>>>>>needed to organize the event needed probably eated up the biggest part
>>>>>of all that money, which IBM claims that the event Deep Blue-Kasparov took.
>>>
>>>>>Greetings,
>>>>>Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.