Computer Chess Club Archives




Subject: Re: MTD(f) (was Re: New(?) search idea.)

Author: Don Dailey

Date: 22:19:24 01/22/98

Go up one level in this thread


This is a good post from you.  It does address many of my misgivings
with MTD(f) and your experiences closly mirror mine.

First of all I didn't mean to imply you were not a performance freak,
I know in fact that you are.  Also I want to emphasize that I'm not
critical of you or anyone else who does not use MTD(f).  I can see
you clearly understand the problems involved and have made an informed
decision.  I can't even say for sure my decision to use it is right.

We do get a lot of benefit from it so it's hard to throw it away.
Your point about the PV's cancelling the benefit of the speedup it
probably not the case, at least for us.  I agree it cancels some
of the benefit but we do in fact predict a lot of moves.  It just
really pisses me off that occasionally a really obvious move like
a recapture does not get predicted and we have lost some free time.
But most of the time we get these.   A good solution for tournaments
is to do a short search for prediction and this will always produce
a good move.  We could do a 9 ply search for instance and stop if
it exceeds 1/2 second or some such hack.  It's ugly but workable.

I am with you in my distaste for this behavior.  I HATE these ugly
PV's I cannot trust.  I have yet to see what will happen when I
implement proper PV's but I'll let you know.  Your note is discouraging!

I realized you clearly knew what you were talking about (as if there
was any doubt!) when you brought up the issue of lazy evaluation!
Yes, mtd(f) plays hell with it.   I have always relied heavily on
lazy evaluation (I call it score fudging) but over the years have
grown to hate it.  With mtd(f) my experiences with it have been
awful.  mtd(f) (for those watching) relies heavily on the scores
returned from each probe.  With lazy evaluation you tend to return
relaxed scoring information which affects the efficiency of mtd(f)!
Your nodes per second go up, but the number of nodes needed go up
to match it!  It really sucks!  There is also some bad side behavior
if your "fudge margin" is too small AND it crosses the beta boundary.
For instance you
interpret (sc - fudge) as a lower bound because it is above beta but
fudge is too small and it turns out the real situation is that the
score is in fact and upper bound (below beta!)   Your error can
propogate to the root and be more than the difference between your
error and fudge when this happens.  What happens is blunders.
My solution to this one was to limit the evaluation so that it never
was greater than FUDGE in either direction.  Another hack.  I
discovered this one when the error got GREATER as I increased the
fudge margin until it exactly matched my error and was suddenly
fixed.  This never happened unless the error "crossed over!"

But now I'm about to commit hearesy.  I know this will generate a
ton of HATE mail but here goes (my arms now protecting my face):
I now believe lazy evaluation is problematic at best, incorrect in
general.  Even without mtd(f), as my evaluation has grown more
sophisticated I am noticing lots of problems with lazy evaluation.
I simply have too many big scoring terms now and it's getting harder
and harder to cascade them.  I used to calculate material and pawn
structure (looking for square of pawn stuff) and then fudge it.  Then
try the next largest big scoring thing etc.  But now I have aggressive
king safety, more aggressive pawn evaluation in general, and much more.
It's impossible to arrange my code
such that I can test these in cascading order to take advantage
of lazy eval.  Too many things depend on other things and I would
have to use too big a margin to fix it (which decreases its benefit.)
I end up doing almost all my evaluation to get to all the big scoring

Even with fudging I get little benefit (I used to get 30% in the
old days when life was simpler), now get very little, so I'm not
crying too much over the loss of it.  YES I threw it out.  I feel
this huge burden being lifted now and have found peace!  I know
now that my scores are always correct and I don't lose any sleep
over it!    I am looking into some ways to do really fast evaluation
and still keep it sophisticated and will share my discoveries (assuming
I come up with any) with the group.   I'll probably reinvent someone
elses wheels!

Not to mention the way information is presented.  I always liked
watching the move list proceed and with MTD(f) this is gone too
(unless you want to watch it over and over for each probe of each
iteration!)  It's only a semantic issue but I prefer the normal
mini-max behavior.

Anyway, all your points are well taken.  MTD(f) is a strange beast
and I'm not really promoting it.   If anyone wants to try it out
who hasn't yet, be prepared for a real struggle.   I consider it
worth the problems and I believe it makes my program stronger.  But
it's not hard for me to believe it could also be the wrong choice
for a different program.  If your program relies heavily on score
fudging it will probably decrease your performance.

- Don

On January 22, 1998 at 22:23:36, Robert Hyatt wrote:

>On January 22, 1998 at 21:59:57, Don Dailey wrote:
>>On January 22, 1998 at 14:28:27, Stuart Cracraft wrote:
>>>On January 22, 1998 at 13:33:42, Don Dailey wrote:
>>>>By the way, I use an algorithm called mtd(f) and there is no concept
>>>>of a "window" in the search itself (just the search driver.)  All
>>>>searches return quickly and are guaranteed to fail!  In some sense
>>>>the same moves get searched over and over on the same iteration
>>>>although the hash table tends to avoid most of this work (except
>>>>when it is needed to prove a point.)   The search driver is tricky
>>>>piece of code and the obvious approach is quite bad (don't use a
>>>>binary search mtd probe strategy!)
>>>>The thing I hate about mtd is the data returned is so different
>>>>from a conventional program.  You cannot get a nice list of moves
>>>>showing progress and the pv appears many times (usually modified)
>>>>each time.  Even though I have good root move ordering I stopped
>>>>thinking about it with mtd(f).  Your ideas almost sounded foreign
>>>>to me because of this!   Massaging my root move list would clearly
>>>>be a problem for efficiency.   MTD is so good I cannot imagine
>>>>using anything else though.  I am getting over 15%  speedup over
>>>>my very best implementation of pvs search (margin on first move,
>>>>zero width on siblings with research on fail hi's)   I prefer
>>>>PVS's behavior but cannot ignore the extra performance of MTD(f).
>>>>The (f) refers to the type of driver used.  I don't strictly use
>>>>the 'f' driver because I found some enhancements of my own but
>>>>it's basically mtd(f)
>>>>- Don
>>>I tried MTD(f) at one point and was impressed as well. Also, I saw
>>>the tables in Aske Platt's thesis comparing it with other search
>>>techniques for three strategy games. Those tables show really good
>>>speedups for chess.
>>>Now, the question. I still have my MTD(f) code but it does not
>>>record the principal variation.
>>>How did you do it? Please be as specific as you can. Now that my
>>>transposition table is working much better, I'd like to consider
>>>re-using my past MTD(f) code.
>>>15% is nothing to sneeze at.
>>Hi Stuart,
>>I have problems with PV's.  In Cilkchess 1, I implemented MTD(f) and
>>Aske Plaat improved the driver (I was doing it wrong and still getting
>>a small speedup over PVS.)  Aske Plaat implemented the PV reporting
>>which basically walked the hash table and picked up the best moves.
>>However I don't think this is a good strategy if you want accurate
>>PV's.   Cilkchess 1 had trouble reporting PV's and often the 2nd move
>>would be a silly blunder.  This was scary and made us think the search
>>was broken but it solved every problem in our test sets in the expected
>>depth.  We won Dutch championship without a single loss out of 11 games
>>and pronounced the search SOUND.   We are now are confident it's a PV
>>issue only.  But for the tournament we simply did a 2 ply search to
>>get the opponents move and hoped the quality of it was high enough to
>>be a good predictor.  We never got blunders but often predicted weaker
>>moves or did not see simple tactics in the prediction.
>>In Cilkchess 2.0 I implemented the PV again using the same technique.
>>At the Dutch this year we kept predicting these horrible moves for our
>>opponents and basically rarely got to think on the opponents time.  Our
>>hash table was broken too and we operated with 1/16 of the hash table
>>we could have used and this may also affect the PV thing.  We lost a
>>game and could only manage 2nd place with all that hardware and our PV
>>implementation is partly to blame (and the fact that the program
>>We tried to fix the PV problem in the middle of the tournament, mainly
>>by paying attention to "beta."   We have no Alpha in our search and
>>beta is equivalent to alpha (and beta) in other programs.  We did some
>>hack to make sure we did not change the 2nd move in the hash table
>>unless the score is properly bounded.  On my list of "things to do"
>>is to experiment with this in general (not for PV reasons) to see if
>>there are conditions under which you should or should not change the
>>best move in the hash table.  The end result seemed (this is a
>>judgement) to be an improvement and in 1 game we actually predicted
>>almost every single move of the opponent, having 45 minutes on our
>>clock at move 60 while the opponent had to reset for speedchess (they
>>allow this in this tournament.)   I thought the problem was fixed for
>>good but in the next few games the bad behavior reared it's ugly head
>>again but seemed much less severe.  All of my non MTD programs had
>>sound PV's to the best of my knowledge.
>when I tried this, I used (eventually) the idea of storing two bounds.
>I stored a fail high bound, I stored the best move.  When I stored a
>fail low
>move, there obviously is no best move to store.  In this case I left the
>fail-high move there.
>This partially worked.  But it is *far* from perfect.  Particularly if
>get the occasional lucky break of getting one fail high, followed by
>fail lows.  It is possible to overwrite enough stuff that the PV simply
>Of all the problems I had, this was the worst.  1. I want good moves.
>faster is a totall loss if you can't predict good moves for the
>because the lost "ponder" time is *much* worse than 15%.
>Lazy eval is another big problem... because I cut off evaluating if It
>I can't get close to alpha, or close to beta depending.  And when I back
>partial eval back to the root, it won't give me a good estimate for the
>mtd(f) re-search.  So I move the window up, search again, and the score
>even higher.  Until I finally reach a point where the lazy eval won't
>early.  But 3-4-5 re-searches and this is not as efficient as pure PVS.
>That's where I left off.  My lazy eval gets much more than 15%.  I'd
>to discard that or suffer thru extra re-searches...
>>I am soon going to implement PV's the right way and tackle this problem.
>>Essentially a PV is just information passed toward the top of the tree
>>root node) in recursive fashion and is trivial to implement.   I don't
>>know what problems I will find but I intend to get it right.  I will
>>definitely post my results if you find it interesting.
>I don't see how you can get a good PV from mtd(f).  The problem is that
>every node fails either high or low.  The ones that fail low give you
>information on which move is best.  The hash table can't be depended on
>provide this any more than it can in a normal PVS search, because of the
>overwriting problem, not to mention the problem of storing fail-low
>that are just as important (actually these are the *most* important when
>you measure performance) but which don't have a best move.  If you are
>lucky enough to overwrite an identical position that was a fail high or
>node, you can save the best move, but this didn't happen often for me...
>>MTD(f) is hell bent on not doing any more work than necessary and causes
>>problems with PV's but I don't think they are unsolvable.  I definitely
>>want sound PV's.  I mentioned there were things about MTD I didn't like
>>and this is only one of them.   I would love to throw it out completely
>>and actually tried to, but could not even come within 15% of matching
>>its  performance.  It's really quite powerful on the serial as well as
>>the parallel program.   Unless you are a performance freak (which I am)
>>don't use it!
>Hmmm...  I have *always* been a performance freak.  :)  That's why I
>worked on parallel search algorithms..., wrote 20,000 lines of Cray
>assembly code, etc.  and then did the bitmap algorithms now used in
>Crafty...  I *never* ignore potential speedups...  When I first heard
>it I downloaded Aske's paper about it and went to work...
>However, have you done a lazy eval?  If not you can get 15% there, but
>will play hell with mtd(f) by not letting fail-soft give you a very
>estimate.  Although in my case, when I turned it off, I still didn't get
>very good estimate.  Closer than using either alpha or beta of course,
>not close enough to make it pay off for me...
>>- Don

This page took 0.15 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.