Subject: Re: Hello from Edmonton (and on Temporal Differences)

Author: Bas Hamstra

Date: 05:04:25 08/05/02

On August 04, 2002 at 23:58:05, Robert Hyatt wrote:

>On August 04, 2002 at 09:04:39, Vincent Diepeveen wrote:
>>On July 31, 2002 at 21:35:32, James Swafford wrote:
>>>On July 31, 2002 at 18:10:08, Vincent Diepeveen wrote:
>>>>On July 30, 2002 at 22:43:36, James Swafford wrote:
>>>>>Hey everyone.  I'm at an AAAI conference in Edmonton.  It's ironic (to me)
>>>>>that it's been mentioned here recently that Edmonton is a hive of computer
>>>>>chess enthusiasts.  I don't know if that's true (what's a "hive"? :-), but
>>>>>there are certainly a few...
>>>>>Now to my question.  I asked Jonathon Schaeffer today (who is a really
>>>>>nice guy, IMO) some questions about his experience with TD learning
>>>>>algorithms.  He's (co?)published a paper entitled (something like)
>>>>>"Temporal Difference Learning in High Performance Game Playing."  I
>>>>>thought the title was a bit misleading, because he focused on checkers.
>>>>>Checkers programs have much smaller evaluation fuctions than chess
>>>>>programs, obviously.  I asked him if he thought the TDLeaf(Lambda)
>>>>>algorithm had potential in high calibre chess.  (Yes, yes, I know
>>>>>all about Knightcap... but that wasn't quite "high" calibre.)
>>>>>He responded with a very enthusiastic "yes".  He said "I'll never manually
>>>>>tune another evaluation function again."
>>>>And he'll never do a competative chessprogram again either, he forgot to
>>>>add that too.
>>>>>A natural follow up question (which I also asked) is -- then why isn't
>>>>>everyone doing it??  I don't _believe_ (and maybe I'm wrong about this)
>>>>>that any top ranked chess programs use it.  His response was simply:
>>>>>"There's a separation between academia and industry."  Schaeffer stated
>>>>Schaeffer is well known for his good speeches and answers :)
>>>>>that perhaps the programmers of top chess programs don't believe in
>>>>>the potential of temporal difference algorithms in the chess domain.
>>>>>Or, perhaps, they don't want to put the effort into them.
>>>>>I believe Crafty is the strongest program in academia now.  If not,
>>>>>certainly among the strongest.  So, Bob -- have you looked at TDLeaf
>>>>>and found it wanting?  It's interesting (and perplexing) to me that
>>>>>paper after paper praises the potential of TDLeaf, but it's _yet_ to
>>>>>be used in the high end programs.  Knightcap was strong, but it's
>>>>>definitely not in the top tier.
>>>>I remember Knightcap very well. TD learning had the habit to slowly
>>>>make it more aggressive until it was giving away a piece for 1 pawn and
>>>>a check.
>>>>Then of course the 'brain was cleared' and experiment restarted.
>>>>So in short the longer the program used the TD learning the worse it
>>>>would play, from my viewpoint.
>>>>Definitely from a chessplayers viewpoint it did. Of course we must not
>>>>forget that in the time it played online, that nearly no program was
>>>>very aggressive. So doing a few patzer moves was a good way to get from
>>>>perhaps scoring 11% to 12% or so.
>>>>>Maybe Tridgell/Baxter quit to soon, and Knightcap really could've been
>>>>>a top tier program.  Or maybe the reason nobody is using TD is because
>>>>>it's impractical for the large number of parameters required to be
>>>>>competitive in chess.  Or maybe Schaeffer was right, and the commercial
>>>>>guys just aren't taking TD seriously.
>>>So, I can put you on record as saying that TD-Leaf is never going to
>>>produce a high calibre player?
>>For a complex evaluation TD learning will never achieve what handtuning
>>by an experienced chess programmer is doing. That is a statement i'm
>>willing to make.
>>Of course if you start with the most stupid tuned set like putting
>>everything to zero or everything to -1, then it looks as if TD learning
>>and all other random forms of learning are ok.
>>Same for neural networks and such. I toyed quite a bit with simple
>>neural networks, simply because there are several out there to toy
>>The major problem is that i for example conclude that open files are
>>more important than a pawn in the center, *any* form of general learning
>>will never, by definition, being able to conclude the same, for the obvious
>>reason that it has no domain knowledge.
>>We can discuss till chess is solved, but it's definitely a really simple
>>case here. The proof is so obvious that it doesn't work, that i am always
>>amazed by people who say it works for them.
>>That must be persons who don't know the difference between a bishop and
>>a knight ;)
>>What i advice is to tune crafty against an opponent where crafty scores
>>80% against now. Tuning something in order to achieve < 50% is real simple,
>>because thereis no proof that it could be done better.
>>You really see the difference between automatic tuning and hand tuning
>>when an engine is crushing a certain opponent with the hand tuning.
>>Now automatic tune it to get more than that. to get 90% instead of 80%.
>>If you have an incredible bad engine and you modify a random thing in
>>search, it also might still play incredible bad, but a bit better.
>>For the stronger engines however this is way harder.
>>So turn off learning in crafty, find an opponent where it scores well against,
>>then autotune crafty. It has a very small evaluation, and the few patterns
>>it has, they are even requiring no arrays to tune. so very little
>>parameters are there to tune. Should be easy nah?
>No arrays?  Have you looked at the code?  It has many arrays.  Some of which
>are used in a third-order fashion.  lookups and summations from a first array,
>then that value is used to index into a second array...  some sums of those
>and that value indexes into a third array...
>I think TD learning would be tough.  But I don't see why it can't work.  Just
>because it might be hard to do doesn't mean it is impossible to do...

I have played with it. I am convinced it has possibilities, but one problem I
encountered was the cause-effect problem. For say I am a piece down. After I
lost the game TD will conclude that the winner had better mobility and will tune
it up. However worse mobility was not the *cause* of the loss, it was the
*effect* of simply being a piece down. In my case it kept tuning mobility up and
up until ridiculous values.


