Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: deep blue's automatic tuning of evaluation function

Author: Vincent Diepeveen
Date: 14:10:59 03/24/03
On March 24, 2003 at 11:34:53, Robert Hyatt wrote:

>On March 24, 2003 at 10:43:22, Vincent Diepeveen wrote:
>
>>On March 23, 2003 at 01:00:47, Robert Hyatt wrote:
>>
>>bob i explained the TD learning here. not crafty. i didn't knock
>>against crafty at all.
>
>Just look at your reference to crafty.  Unnecessary to even name an opponent.

It was needed to explain why TD did score a few points.

It scored it because crafty had a very weak king safety at the time, which
explains why by rude tuning, the TD learning could give away full pieces for a
few checks and of course we must not forget that i with diep searched like 6 ply
at the time. with crafty you got 7-8 ply and knightcap got also
7 or 8 ply even.

Against nowadays depths this would not score a single point simply.

In short the 'noise' created by a major lack of software at the time it managed
to get a few points, in combination with a search depth that was more than ok at
the time.

So further in time those learning algorithms proved worse than they looked at
the time.



>
>>
>>i was talking about the many games knightcap-crafty to show why i find the
>>results drawn from the TD learning experiment are overreacted.
>
>This is yet another "it is impossible because I can't see how to make it work"
>type
>of discussion?  _nothing_ sayd TD won't work.  It _hasn't_ worked real well, so

You go too far now Hyatt. You are just busy giving criticism against me, where i
know you do not have personally a coin believe in TD learning anyway. I remember
some hard postings of you there some years ago.

>far, but then again full-width search didn't work in 1970 either.  But it does
>now.

feel free to believe in the unknown random learners. i hope you realize that it
is just at random flipping a few values and then retests.

>
>
>
>>
>>i could have said thebaron-knightcap as well, but it played many games against
>>crafty.
>>
>>you focus too much upon the word crafty here. focus upon the original question
>>of the poster which is: "learning?"
>>
>>
>>>On March 22, 2003 at 15:10:52, Vincent Diepeveen wrote:
>>>
>>>>On March 22, 2003 at 07:21:26, emerson tan wrote:
>>>>
>>>>>I heard that deep blue uses automatic tuning for its evaluation function, does
>>>>>this mean that as it plays games against humans and computers, deep blue will
>>>>>self tune its evaluation function base on the results of its games against
>>>>>humans and computer? If it is, is it effective? Are the other programs using
>>>>>automatic tuning also?
>>>>
>>>>Many tried tuning automatic, but they all failed. Human tuning is way better and
>>>>more accurate. Big problems of autotuners is the number of experiments needed
>>>>before a good tuning is there.
>>>>
>>>>Basically you only can proof a good tuning by playing games against others.
>>>>
>>>>That will in short take another few thousands of years to tune a complex
>>>>evaluation.
>>>>
>>>>Doesn't take away that everyone has tried a bit in that direction and probably
>>>>will keep trying.
>>>>
>>>>Current algorithms simply do not work.
>>>>
>>>>Also the much praised TD learning is simply not working.
>>>>
>>>>What it did was overreact things in the long term. So for example at the time
>>>>that crafty was at a single cpu pro200 and those versions had a weak king safety
>>>>some years ago, it would find out that weakness of crafty not as we conclude it.
>>>>It just concluded that it could by sacraficing pieces and such towards crafty
>>>>king, that this would work a bit.
>>>
>>>
>>>
>>>Why don't you spend more time talking about _your_ program and less time
>>>knocking mine?  Crafty 1996 (pentium pro 200) did just as well against
>>>Diep 1996 as Crafty of today does against Diep of today.  If my king safety
>>>was weak in 1996 so was yours.
>>>
>>>Why don't you give up this particular path of insults?  It only makes you
>>>look idiotic.  That "weak" version of Crafty in 1996 finished in 4th place
>>>at the WMCCC event.  Where did yours finish that year?  In fact, I'd bet
>>>that KnightCap had a better record against _your_ program that it did
>>>against Crafty.  Which makes your comparison all the more silly.
>>>
>>>Again, you should spend more time working and less time knocking other
>>>programs.  Your program would get better, faster.
>>>
>>>
>>>
>>>>
>>>>but especially 'a bit' is important here. It of course would have been happy
>>>>scoring 20% score or so. So if you gamble 10 games in a row and win 2 in that
>>>>way and without a chance lose the others, then that might seem to work, but in
>>>>absolute terms you are doing a bad job, because scoring 20% sucks.
>>>>
>>>>Of course those were the days that in 5 0 or 5 3 blitz levels the programs got
>>>>very small search depths. Not seldom computer-computer games were tactically
>>>>dominated in these days around 1997.
>>>>
>>>>Concluding that pruning works based upon those slaughter matches (where the
>>>>knightcap stuff got butchered many games in a row then winning 1 game for it by
>>>>some aggressive sacrafice) is not the right conclusion IMHO.
>>>>
>>>>Note other selflearning experts have more criticism against TD learning which i
>>>>do not share too much. Their criticism is that some stuff is hard coded, so the
>>>>tuner can't go wrong there. For me that 'cheating' is a smart thing to do
>>>>however, because it is clear that tuning without domain knowledge isn't going to
>>>>work within a year or 100.
>>>>
>>>
>>>The DB guys didn't claim to do TD learning or any other _automated_ learning
>>>whatsover.  They claimed to have an evaluation _tuning_ tool that did, in fact,
>>>seem to work.
>>>
>>>One problem is that when you change an eval term to correct one flaw, you can
>>>introduce other bad behavior without knowing it.  They tries to solve this by
>>>the least-squares summation over a bunch of positions so that you could
>>>increase something that needed help without wrecking the program in positions
>>>where it was already doing well.
>>>
>>>The idea has (and still has) a lot of merit...  Just because nobody does it
>>>today doesn't mean it is (a) bad, (b) impossible, or (c) anything else.
>>>
>>>
>>>
>>>>In later years when hardware became faster, also evaluations became better
>>>>without clear weak chains.
>>>>
>>>>Evaluations without clear weak chains are very hard to automatically tune.
>>>>
>>>>Basically tuners have no domain knowledge, so if you have a couple of thousands
>>>>of patterns, not to mention the number of adjustable parameters, it will take
>>>>more time than there are chess positions, to automatically tune them.
>>>>
>>>>And it is sad that the much praised TD learning, which completely sucked
>>>>everywhere from objective perspective, is praised so much as a big step.
>>>>
>>>>Basically TD learning demonstrates that someone *did* do effort to implement TD
>>>>learning and we can praise the person in question for doing that.
>>>>
>>>>Most 'learning' plans do not leave the paper ever.
>>>>
>>>>But having seen hundreds of games from knightcap i definitely learned that
>>>>tuning without domain knowledge is really impossible.
>>>>
>>>>A result from those paper learning in AI world is next. Chessprograms improve
>>>>and improve, but also get more complex. To list a few of the stuff programs
>>>>might simulatenously have (without saying A sucks and B is good):
>>>>  - alfabeta
>>>>  - negamax
>>>>  - quiescencesearch
>>>>  - hashtables
>>>>  - multiprobing
>>>>  - complex datastructure
>>>>  - nullmove
>>>>  - possible forms of forward pruning
>>>>  - killermoves
>>>>  - move ordering
>>>>  - SEE (qsearch, move ordering)
>>>>  - futility
>>>>  - lazy evaluation
>>>>  - quick evaluation
>>>>  - psq tables to order moves
>>>>  - probcutoff
>>>>  - reductions
>>>>  - forward pruning (one of the many forms)
>>>>  - iterative deepening
>>>>  - internal iterative deepening (move ordering)
>>>>  - fractional ply depth
>>>>  - parallel search algorithms
>>>>  - check extensions
>>>>  - singular extensions
>>>>  - mating extensions
>>>>  - passed pawn extensions
>>>>  - recapture extensions
>>>>  - other extensions (so many the list is endless)
>>>>
>>>>This is just what i could type within 2 minutes. In short. All kind of
>>>>algorithms and methods get combined to something complex and more complex and it
>>>>is all 'integrated' somehow and some domain dependant; so requiring a lot of
>>>>chess technical code some in order to work well.
>>>>
>>>>Because 99.9% of all tuning algorithms do not leave the paper, they usually can
>>>>be described in a few lines of pseudo code. For that reason most are for 99.9%
>>>>doing the same similar thing in the same way, but have a new cool name. Just
>>>>like nullmove R=2 and R=3 is exactly the same algorithm (nullmove), but just a
>>>>small implementation detail is different.
>>>>
>>>>Yet the simplicity of the AI-learning world is so small, most concepts are
>>>>simply paper concepts which do not work in the real world.
>>>>
>>>>If they ever leave the paper then they perform a silly experiment or conclude
>>>>things in the wrong way. Never objective science is getting done there.
>>>>
>>>>Those scientists hate programming simple. So the possibility to combine methods
>>>>and especially combine them with domain dependant knowledge is like near zero.
>>>>In that way TD learning is seen as a grown up algorithm.
>>>>
>>>>The good thing of it, is that it is doing something without crashing. The bad
>>>>thing of it is that it left the paper so we could see how poor it worked.
>>>>
>>>>It would be too hard to say that the work has been done for nothing. In
>>>>contradiction. It simply proofs how much paper work the AI is now and how people
>>>>can believe in the unknown.
>>>>
>>>>I have met at least 100 students who wanted to make a selflearning thing. Either
>>>>neural network, genetic algorithm etc.
>>>>
>>>>In fact in all those years only 1 thing i have seen working and that was a
>>>>genetic algorithm finding a shorter path than my silly brute force search did
>>>>(we talk about hundreds of points). That genetic algorithm however had domain
>>>>dependant knowledge and was already helped with a short route to start with...
>>>>
>>>>Best regards,
>>>>Vincent
Re: deep blue's automatic tuning of evaluation function Robert Hyatt 14:55:26 03/24/03
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.