Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: deep blue's automatic tuning of evaluation function

Author: Robert Hyatt
Date: 14:55:26 03/24/03
On March 24, 2003 at 17:10:59, Vincent Diepeveen wrote:

>On March 24, 2003 at 11:34:53, Robert Hyatt wrote:
>
>>On March 24, 2003 at 10:43:22, Vincent Diepeveen wrote:
>>
>>>On March 23, 2003 at 01:00:47, Robert Hyatt wrote:
>>>
>>>bob i explained the TD learning here. not crafty. i didn't knock
>>>against crafty at all.
>>
>>Just look at your reference to crafty.  Unnecessary to even name an opponent.
>
>It was needed to explain why TD did score a few points.
>
>It scored it because crafty had a very weak king safety at the time, which
>explains why by rude tuning, the TD learning could give away full pieces for a
>few checks and of course we must not forget that i with diep searched like 6 ply
>at the time. with crafty you got 7-8 ply and knightcap got also
>7 or 8 ply even.

I don't remember _any_ cases where knightcap tossed a full piece for a
speculative
attack, nor do I remember any cases where that might have worked.

I have all the games in PGN however and I'll make a quick pass through them
tonight
to see if I missed such.  But IMHO knightcap played just fine, it just had
trouble with
all the various positional ideas and seemed to be unable to get the weights
right, assuming
it even had some of the critical positional ideas included of course.

>
>Against nowadays depths this would not score a single point simply.
>
>In short the 'noise' created by a major lack of software at the time it managed
>to get a few points, in combination with a search depth that was more than ok at
>the time.
>
>So further in time those learning algorithms proved worse than they looked at
>the time.

Doesn't mean they won't work, however, which was my point...


>
>
>
>>
>>>
>>>i was talking about the many games knightcap-crafty to show why i find the
>>>results drawn from the TD learning experiment are overreacted.
>>
>>This is yet another "it is impossible because I can't see how to make it work"
>>type
>>of discussion?  _nothing_ sayd TD won't work.  It _hasn't_ worked real well, so
>
>You go too far now Hyatt. You are just busy giving criticism against me, where i
>know you do not have personally a coin believe in TD learning anyway. I remember
>some hard postings of you there some years ago.
>
>>far, but then again full-width search didn't work in 1970 either.  But it does
>>now.
>
>feel free to believe in the unknown random learners. i hope you realize that it
>is just at random flipping a few values and then retests.
>
>>
>>
>>
>>>
>>>i could have said thebaron-knightcap as well, but it played many games against
>>>crafty.
>>>
>>>you focus too much upon the word crafty here. focus upon the original question
>>>of the poster which is: "learning?"
>>>
>>>
>>>>On March 22, 2003 at 15:10:52, Vincent Diepeveen wrote:
>>>>
>>>>>On March 22, 2003 at 07:21:26, emerson tan wrote:
>>>>>
>>>>>>I heard that deep blue uses automatic tuning for its evaluation function, does
>>>>>>this mean that as it plays games against humans and computers, deep blue will
>>>>>>self tune its evaluation function base on the results of its games against
>>>>>>humans and computer? If it is, is it effective? Are the other programs using
>>>>>>automatic tuning also?
>>>>>
>>>>>Many tried tuning automatic, but they all failed. Human tuning is way better and
>>>>>more accurate. Big problems of autotuners is the number of experiments needed
>>>>>before a good tuning is there.
>>>>>
>>>>>Basically you only can proof a good tuning by playing games against others.
>>>>>
>>>>>That will in short take another few thousands of years to tune a complex
>>>>>evaluation.
>>>>>
>>>>>Doesn't take away that everyone has tried a bit in that direction and probably
>>>>>will keep trying.
>>>>>
>>>>>Current algorithms simply do not work.
>>>>>
>>>>>Also the much praised TD learning is simply not working.
>>>>>
>>>>>What it did was overreact things in the long term. So for example at the time
>>>>>that crafty was at a single cpu pro200 and those versions had a weak king safety
>>>>>some years ago, it would find out that weakness of crafty not as we conclude it.
>>>>>It just concluded that it could by sacraficing pieces and such towards crafty
>>>>>king, that this would work a bit.
>>>>
>>>>
>>>>
>>>>Why don't you spend more time talking about _your_ program and less time
>>>>knocking mine?  Crafty 1996 (pentium pro 200) did just as well against
>>>>Diep 1996 as Crafty of today does against Diep of today.  If my king safety
>>>>was weak in 1996 so was yours.
>>>>
>>>>Why don't you give up this particular path of insults?  It only makes you
>>>>look idiotic.  That "weak" version of Crafty in 1996 finished in 4th place
>>>>at the WMCCC event.  Where did yours finish that year?  In fact, I'd bet
>>>>that KnightCap had a better record against _your_ program that it did
>>>>against Crafty.  Which makes your comparison all the more silly.
>>>>
>>>>Again, you should spend more time working and less time knocking other
>>>>programs.  Your program would get better, faster.
>>>>
>>>>
>>>>
>>>>>
>>>>>but especially 'a bit' is important here. It of course would have been happy
>>>>>scoring 20% score or so. So if you gamble 10 games in a row and win 2 in that
>>>>>way and without a chance lose the others, then that might seem to work, but in
>>>>>absolute terms you are doing a bad job, because scoring 20% sucks.
>>>>>
>>>>>Of course those were the days that in 5 0 or 5 3 blitz levels the programs got
>>>>>very small search depths. Not seldom computer-computer games were tactically
>>>>>dominated in these days around 1997.
>>>>>
>>>>>Concluding that pruning works based upon those slaughter matches (where the
>>>>>knightcap stuff got butchered many games in a row then winning 1 game for it by
>>>>>some aggressive sacrafice) is not the right conclusion IMHO.
>>>>>
>>>>>Note other selflearning experts have more criticism against TD learning which i
>>>>>do not share too much. Their criticism is that some stuff is hard coded, so the
>>>>>tuner can't go wrong there. For me that 'cheating' is a smart thing to do
>>>>>however, because it is clear that tuning without domain knowledge isn't going to
>>>>>work within a year or 100.
>>>>>
>>>>
>>>>The DB guys didn't claim to do TD learning or any other _automated_ learning
>>>>whatsover.  They claimed to have an evaluation _tuning_ tool that did, in fact,
>>>>seem to work.
>>>>
>>>>One problem is that when you change an eval term to correct one flaw, you can
>>>>introduce other bad behavior without knowing it.  They tries to solve this by
>>>>the least-squares summation over a bunch of positions so that you could
>>>>increase something that needed help without wrecking the program in positions
>>>>where it was already doing well.
>>>>
>>>>The idea has (and still has) a lot of merit...  Just because nobody does it
>>>>today doesn't mean it is (a) bad, (b) impossible, or (c) anything else.
>>>>
>>>>
>>>>
>>>>>In later years when hardware became faster, also evaluations became better
>>>>>without clear weak chains.
>>>>>
>>>>>Evaluations without clear weak chains are very hard to automatically tune.
>>>>>
>>>>>Basically tuners have no domain knowledge, so if you have a couple of thousands
>>>>>of patterns, not to mention the number of adjustable parameters, it will take
>>>>>more time than there are chess positions, to automatically tune them.
>>>>>
>>>>>And it is sad that the much praised TD learning, which completely sucked
>>>>>everywhere from objective perspective, is praised so much as a big step.
>>>>>
>>>>>Basically TD learning demonstrates that someone *did* do effort to implement TD
>>>>>learning and we can praise the person in question for doing that.
>>>>>
>>>>>Most 'learning' plans do not leave the paper ever.
>>>>>
>>>>>But having seen hundreds of games from knightcap i definitely learned that
>>>>>tuning without domain knowledge is really impossible.
>>>>>
>>>>>A result from those paper learning in AI world is next. Chessprograms improve
>>>>>and improve, but also get more complex. To list a few of the stuff programs
>>>>>might simulatenously have (without saying A sucks and B is good):
>>>>>  - alfabeta
>>>>>  - negamax
>>>>>  - quiescencesearch
>>>>>  - hashtables
>>>>>  - multiprobing
>>>>>  - complex datastructure
>>>>>  - nullmove
>>>>>  - possible forms of forward pruning
>>>>>  - killermoves
>>>>>  - move ordering
>>>>>  - SEE (qsearch, move ordering)
>>>>>  - futility
>>>>>  - lazy evaluation
>>>>>  - quick evaluation
>>>>>  - psq tables to order moves
>>>>>  - probcutoff
>>>>>  - reductions
>>>>>  - forward pruning (one of the many forms)
>>>>>  - iterative deepening
>>>>>  - internal iterative deepening (move ordering)
>>>>>  - fractional ply depth
>>>>>  - parallel search algorithms
>>>>>  - check extensions
>>>>>  - singular extensions
>>>>>  - mating extensions
>>>>>  - passed pawn extensions
>>>>>  - recapture extensions
>>>>>  - other extensions (so many the list is endless)
>>>>>
>>>>>This is just what i could type within 2 minutes. In short. All kind of
>>>>>algorithms and methods get combined to something complex and more complex and it
>>>>>is all 'integrated' somehow and some domain dependant; so requiring a lot of
>>>>>chess technical code some in order to work well.
>>>>>
>>>>>Because 99.9% of all tuning algorithms do not leave the paper, they usually can
>>>>>be described in a few lines of pseudo code. For that reason most are for 99.9%
>>>>>doing the same similar thing in the same way, but have a new cool name. Just
>>>>>like nullmove R=2 and R=3 is exactly the same algorithm (nullmove), but just a
>>>>>small implementation detail is different.
>>>>>
>>>>>Yet the simplicity of the AI-learning world is so small, most concepts are
>>>>>simply paper concepts which do not work in the real world.
>>>>>
>>>>>If they ever leave the paper then they perform a silly experiment or conclude
>>>>>things in the wrong way. Never objective science is getting done there.
>>>>>
>>>>>Those scientists hate programming simple. So the possibility to combine methods
>>>>>and especially combine them with domain dependant knowledge is like near zero.
>>>>>In that way TD learning is seen as a grown up algorithm.
>>>>>
>>>>>The good thing of it, is that it is doing something without crashing. The bad
>>>>>thing of it is that it left the paper so we could see how poor it worked.
>>>>>
>>>>>It would be too hard to say that the work has been done for nothing. In
>>>>>contradiction. It simply proofs how much paper work the AI is now and how people
>>>>>can believe in the unknown.
>>>>>
>>>>>I have met at least 100 students who wanted to make a selflearning thing. Either
>>>>>neural network, genetic algorithm etc.
>>>>>
>>>>>In fact in all those years only 1 thing i have seen working and that was a
>>>>>genetic algorithm finding a shorter path than my silly brute force search did
>>>>>(we talk about hundreds of points). That genetic algorithm however had domain
>>>>>dependant knowledge and was already helped with a short route to start with...
>>>>>
>>>>>Best regards,
>>>>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.