Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hello from Edmonton (and on Temporal Differences)

Author: Robert Hyatt

Date: 08:53:53 08/05/02

Go up one level in this thread


On August 05, 2002 at 08:04:25, Bas Hamstra wrote:

>On August 04, 2002 at 23:58:05, Robert Hyatt wrote:
>
>>On August 04, 2002 at 09:04:39, Vincent Diepeveen wrote:
>>
>>>On July 31, 2002 at 21:35:32, James Swafford wrote:
>>>
>>>>On July 31, 2002 at 18:10:08, Vincent Diepeveen wrote:
>>>>
>>>>>On July 30, 2002 at 22:43:36, James Swafford wrote:
>>>>>
>>>>>>
>>>>>>Hey everyone.  I'm at an AAAI conference in Edmonton.  It's ironic (to me)
>>>>>>that it's been mentioned here recently that Edmonton is a hive of computer
>>>>>>chess enthusiasts.  I don't know if that's true (what's a "hive"? :-), but
>>>>>>there are certainly a few...
>>>>>>
>>>>>>Now to my question.  I asked Jonathon Schaeffer today (who is a really
>>>>>>nice guy, IMO) some questions about his experience with TD learning
>>>>>>algorithms.  He's (co?)published a paper entitled (something like)
>>>>>>"Temporal Difference Learning in High Performance Game Playing."  I
>>>>>>thought the title was a bit misleading, because he focused on checkers.
>>>>>>Checkers programs have much smaller evaluation fuctions than chess
>>>>>>programs, obviously.  I asked him if he thought the TDLeaf(Lambda)
>>>>>>algorithm had potential in high calibre chess.  (Yes, yes, I know
>>>>>>all about Knightcap... but that wasn't quite "high" calibre.)
>>>>>>He responded with a very enthusiastic "yes".  He said "I'll never manually
>>>>>>tune another evaluation function again."
>>>>>
>>>>>And he'll never do a competative chessprogram again either, he forgot to
>>>>>add that too.
>>>>>
>>>>>>A natural follow up question (which I also asked) is -- then why isn't
>>>>>>everyone doing it??  I don't _believe_ (and maybe I'm wrong about this)
>>>>>>that any top ranked chess programs use it.  His response was simply:
>>>>>>"There's a separation between academia and industry."  Schaeffer stated
>>>>>
>>>>>Schaeffer is well known for his good speeches and answers :)
>>>>>
>>>>>>that perhaps the programmers of top chess programs don't believe in
>>>>>>the potential of temporal difference algorithms in the chess domain.
>>>>>>Or, perhaps, they don't want to put the effort into them.
>>>>>
>>>>>>I believe Crafty is the strongest program in academia now.  If not,
>>>>>>certainly among the strongest.  So, Bob -- have you looked at TDLeaf
>>>>>>and found it wanting?  It's interesting (and perplexing) to me that
>>>>>>paper after paper praises the potential of TDLeaf, but it's _yet_ to
>>>>>>be used in the high end programs.  Knightcap was strong, but it's
>>>>>>definitely not in the top tier.
>>>>>
>>>>>I remember Knightcap very well. TD learning had the habit to slowly
>>>>>make it more aggressive until it was giving away a piece for 1 pawn and
>>>>>a check.
>>>>>
>>>>>Then of course the 'brain was cleared' and experiment restarted.
>>>>>So in short the longer the program used the TD learning the worse it
>>>>>would play, from my viewpoint.
>>>>>
>>>>>Definitely from a chessplayers viewpoint it did. Of course we must not
>>>>>forget that in the time it played online, that nearly no program was
>>>>>very aggressive. So doing a few patzer moves was a good way to get from
>>>>>perhaps scoring 11% to 12% or so.
>>>>>
>>>>>>Maybe Tridgell/Baxter quit to soon, and Knightcap really could've been
>>>>>>a top tier program.  Or maybe the reason nobody is using TD is because
>>>>>>it's impractical for the large number of parameters required to be
>>>>>>competitive in chess.  Or maybe Schaeffer was right, and the commercial
>>>>>>guys just aren't taking TD seriously.
>>>>>>
>>>>>>Thoughts?
>>>>>>
>>>>>>--
>>>>>>James
>>>>
>>>>
>>>>So, I can put you on record as saying that TD-Leaf is never going to
>>>>produce a high calibre player?
>>>
>>>For a complex evaluation TD learning will never achieve what handtuning
>>>by an experienced chess programmer is doing. That is a statement i'm
>>>willing to make.
>>>
>>>Of course if you start with the most stupid tuned set like putting
>>>everything to zero or everything to -1, then it looks as if TD learning
>>>and all other random forms of learning are ok.
>>>
>>>Same for neural networks and such. I toyed quite a bit with simple
>>>neural networks, simply because there are several out there to toy
>>>with.
>>>
>>>The major problem is that i for example conclude that open files are
>>>more important than a pawn in the center, *any* form of general learning
>>>will never, by definition, being able to conclude the same, for the obvious
>>>reason that it has no domain knowledge.
>>>
>>>We can discuss till chess is solved, but it's definitely a really simple
>>>case here. The proof is so obvious that it doesn't work, that i am always
>>>amazed by people who say it works for them.
>>>
>>>That must be persons who don't know the difference between a bishop and
>>>a knight ;)
>>>
>>>What i advice is to tune crafty against an opponent where crafty scores
>>>80% against now. Tuning something in order to achieve < 50% is real simple,
>>>because thereis no proof that it could be done better.
>>>
>>>You really see the difference between automatic tuning and hand tuning
>>>when an engine is crushing a certain opponent with the hand tuning.
>>>
>>>Now automatic tune it to get more than that. to get 90% instead of 80%.
>>>
>>>If you have an incredible bad engine and you modify a random thing in
>>>search, it also might still play incredible bad, but a bit better.
>>>
>>>For the stronger engines however this is way harder.
>>>
>>>So turn off learning in crafty, find an opponent where it scores well against,
>>>then autotune crafty. It has a very small evaluation, and the few patterns
>>>it has, they are even requiring no arrays to tune. so very little
>>>parameters are there to tune. Should be easy nah?
>>
>>No arrays?  Have you looked at the code?  It has many arrays.  Some of which
>>are used in a third-order fashion.  lookups and summations from a first array,
>>then that value is used to index into a second array...  some sums of those
>>and that value indexes into a third array...
>>
>>I think TD learning would be tough.  But I don't see why it can't work.  Just
>>because it might be hard to do doesn't mean it is impossible to do...
>
>I have played with it. I am convinced it has possibilities, but one problem I
>encountered was the cause-effect problem. For say I am a piece down. After I
>lost the game TD will conclude that the winner had better mobility and will tune
>it up. However worse mobility was not the *cause* of the loss, it was the
>*effect* of simply being a piece down. In my case it kept tuning mobility up and
>up until ridiculous values.
>
>Bas.


This is a known problem.  If your eval doesn't have an important term in it,
then trying to tune it over a large set of positions will lead to trouble,
because you will try to cover up the "hole" by tweaking the wrong scores...

Perhaps once you are convinced you are evaluating most of the really important
things, this might work.  I'm a long way from that point myself...



This page took 0.05 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.