Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: MTD(f)

Author: Vincent Diepeveen

Date: 08:41:00 06/18/05

Go up one level in this thread


On June 18, 2005 at 06:00:37, Vasik Rajlich wrote:

>On June 17, 2005 at 23:16:10, Robert Hyatt wrote:
>
>>On June 17, 2005 at 18:53:53, Vincent Diepeveen wrote:
>>
>>>On June 16, 2005 at 18:33:11, Vasik Rajlich wrote:
>>>
>>>>On June 16, 2005 at 13:38:55, Vincent Diepeveen wrote:
>>>>
>>>>>On June 15, 2005 at 22:40:22, Tor Alexander Lattimore wrote:
>>>>>
>>>>>I wonder why there is still people using MTD in chessprograms above PVS.
>>>>>
>>>>>In PVS you can of course easily determine the maximum number of nodes you had to
>>>>>search extra on top of nullwindows. In DIEP it's just a couple of thousands at
>>>>>in total millions of moves.
>>>>>
>>>>>So the theoretic gain MTD CAN give, when you start measuring it, it is really
>>>>>negative.
>>>>>
>>>>
>>>>I agree now after some bad experiments that MTD (f) isn't worth the trouble, but
>>>>the refutation isn't quite this simple.
>>>>
>>>>Imagine white has two moves, x and y. x is weak, but you search it first with a
>>>>big window. The first response to x gives a score of -2.0, and now you search
>>>>all the other responses to x (with null-windows around -2.0), and after a big
>>>>search x is able to maintain the -2.0. Note that the vast majority of this
>>>>search is null-window.
>>>>Now finally you search y, and it gives you 0.0.
>>>>You could have skipped all (almost all) of those zero-window searches of x, if
>>>>only you had switched to y more quickly. That's what MTD (f) tries to do. ?>You're
>>>>not committing to a move. Basically a wide window commits you to the first move
>>>>you happen to try - you're going to search it anywhere inside that window.
>>>
>>>The basic flaw in the above argumentation is that it requires MTD to have
>>>somehow from the skies a magical sorting S which is a better one than PVS gives
>>>you.
>>>
>>>I directly bury that idea. MTD does *not* have a better move ordering than PVS
>>>gives you.
>>>
>>>The insight is just not there, let alone the proof, that MTD *can* give a better
>>>move ordering than PVS.
>>
>>Actually it does have better ordering.  Because you end up searching the _same_
>>set of moves, to the same search depth, repeatedly.  Two passes, one above the
>>true score and one below the true score, and your move ordering ends up _really_
>>good.
>>
>
>Let me put it like this: in MTD (f), if the first move you happen to try is
>weak, you will spend less time on it. You will just prove that it's worse than
>your first alpha, and continue to the next move. In PVS, you will have to show
>exactly where in the window it falls (or prove that it falls outside of the
>whole window).

That is of course not the case. In PVS you directly start with a correct search
bound and MTD starts with the wrong bound.

For example PVS directly has from hashtable + 1 ply extra search a 0.231 score
(which will later become the PV score) and is all the time searching with a
window [0.231;0.232], so if moves fail low and fail high that won't screw your
move ordering.

So the overhead for PVS to get a very accurate bound for this first root move is
exactly 50-100 nodes or so (exactly 1 ply on top of the previous mainline).

In MTD you start for example with a first window 0.100;0.100 and besides that
you need to research like 25 times or so to get to that 0.231, a major problem
is that nonstop all kind of flips happen and the best move in your hashtable is
completely screwing your move ordering.

MTD only can work at random trees of 30 line programs, which is not the case in
computerchess. We have all kind of move ordering heuristics and a genius
hashtable working for us.

>You will see this behavior in any PVS engine - even a good one. At depth=12, it
>thinks move x is best with a score of 0.25. (Let's say.) At depth=13, move x
>becomes rather weak, and the engine goes on to show that x scores exactly -0.20
>before even considering any other move. This is potentially a big search. Then,
>the search continues on to move y, which is right back to +0.25 or so.

Important in reality is that PVS directly has that fail low to -0.20 after which
we time extend. In MTD it takes ages *before* you have that fail low to -0.20.

It is very cheap to get a mainline with PVS of that -0.20 if in MTD you first
start searching at 0.24;0.25 then at 0.23;0.24 then at 0.22;0.24 then obviously
the first few searches are way more expensive than that single PVS search to get
-0.20

A huge score change MTD always was at its weakest historically, please don't
reverse the facts.

The idea of MTD was that if you are now at 0.24 that it is pretty cheap to get
the next pv line if the score of it also is 0.24

However thanks to having a hashtable nowadays that is equally cheap with PVS
too.

Any huge score change MTD just pukes out and takes forever to get a root score
for your new move.

Additional it is very unlikely if a move fails low from 0.25 to -0.20 now at 13
ply that there is any escape possible to a different move score 0.25. More
likely that move will have perhaps -0.05 or something.

So in practice most MTD folks figured out that you NEED to first get some decent
estimate on how low this moves score is.

Using some binary search of course defeats the 'cheapness' of MTD.

When you just did a search of 0.24 that failed low, then now searching with a
bound 0.23 is cheaper of course than searching with a bound of 0.00

So obviously you'll have to do a number of searches of 0.24 0.23 0.22 0.21 0.20
and so on. Only when say 20 consecutive tried with MTD failed you can try first
and go have a look whether some other move is ok, or you can try some PVS type
thing and start a PVS search to determine quickly the real score.

So when PVS already is 1 ply deeper, then you will have figured out that the
score is -0.20 here.

>This inefficiency is also repeated inside the tree, and it is what MTD (f) tries
>to avoid.

MTD's worst case is there when the score drops some in the root.

If you drop regurarly 200 points in the root then MTD will royally screw you
with its huge overhead.

>I guess I see MTD (f) as a promising and logical bad idea, rather than a
>completely pointless one :)

No serious researcher ever took MTD serious in this sense that it would work.

That's probably why there was not a second paper which showed the results of MTD
versus PVS at a strong program, instead of just some random tree experiments
with a plydepth of 5 ply or so.

>>Of course, that doesn't mean the final cumulative search tree is smaller than
>>PVS.  For me it never was.  But I'll give someone the doubt that they did a
>>better job of it than I did.  I only fooled with it for a couple of months
>>before tossing it.
>>
>>>
>>>>Anyway it's way too late at night to list all the problems with MTD (f) :)
>>>>Vas
>>>
>>>Of course MTD has another 10 problems why you don't want to use it. However the
>>>best insight we get in how outdated it is, is when looking to world champs 2004.
>>>
>>>There Stefan Meyer-Kahlen, someone who will be the last on the planet to do
>>>suggestions to others on how to improve their program, at the world champs 2004
>>>he very careful suggested to Rudolf Huber (ParSOS) that: "Perhaps it is time to
>>>consider dropping MTD and getting back some more decent search algorithm like
>>>PVS for example."
>>>
>>>I have never ever heard Stefan say such a clear suggestion to anyone on this
>>>planet like that.
>>>
>>>So perhaps...
>>>
>
>Most of the search experts don't like MTD (f). Tord has stopped using it, Fabien
>also never liked it. (A discussion from CCC IIRC ..)

>As I see it, it has one huge flaw: when you come to a node, you don't know the
>"true" bounds (ie. the eventual range of values you'll come to that node with),
>and you can't use it to make search decisions.

In chess what matters is not producing a bad move. MTD has a horrible worst case
when you fail low at a big search depth a point or 200. Please note 200 points
in DIEP is just 0.2 pawn so that happens regurarly.

>The most simple example is null move. In every PVS engine I've seen, null move
>is required to fail high. Nobody allows null moves to score inside the window -
>ie. nobody will put null moves in the PVs. In MTD (f), though, if you do a null
>move at a node at the alpha value, and it fails high, you just potentially put a
>null move into your PV.

The cache trashing latency overhead of MTD is too expensive for fast searching
programs. A searcher at 1 million nps or so can never use MTD. He'll lose 40% of
his system time just to memory trashing. MTD needs absolutely that you store
transposition table in qsearch.

So especially at parallel machines it won't run above 8 cpu's EVER.

I did some tests with SOS at a 64 processor itanium2 and it didn't scale above 8
cpu's simply. Scaling just stopped. As *that* was the maximum the memory
controllers could handle a second. So above 8 cpu's it just became slower rather
than faster. I actually tried up to 16 cpu's.

>(This also happens in a PVS engine if a null move fails high, but a research is
>later needed.)
>
>Anyway, there are a number of possible solutions to these problems, but then you
>have to ask yourself if it's worth the trouble. If the final savings are 10% of
>nodes, is it worth dealing with these issues?

The overhead of PVS is real real little.

We can have endless discussions about pvs versus mtd, but until you start
realizing the small overhead of PVS (IF YOU HAVE A DECENT MOVE ORDERING AND A
GOOD FUNCTIONING HASHTABLE) then under that condition MTD will of course have 0%
chance, especially if you consider its huge drawbacks when you fail low 200
points or so.

Vincent

>Vas
>
>>>>>For material only searches or evaluation functions which have little fluctuation
>>>and a small range it's pretty fast though to use MTD, provided you don't mind to
>>>>>lose sometimes games when you hit its worse case.
>>>>>
>>>>>The worst case you can see in the program ParSOS very well at world champs,
>>>>>when it starts to doubt then it has a search depth of plies less to the others.
>>>>>
>>>>>Nevertheless here my answers:
>>>>>
>>>>>>Hi
>>>>>>Is it possible to use a single variable in the MTD(f) search? Something like
>>>>>>this:
>>>>>>
>>>>>>int MTD(int depth, int guess)
>>>>>>{
>>>>>> if (depth<1) return Evaluate();
>>>>>
>>>>>Using hashtable in qsearch of course is a necessity for MTD because of the many
>>>>>researches you're doing.
>>>>>
>>>>>So for very fast searching software MTD is pretty expensive to use, as the
>>>>>hashtable more and more gets a big overhead. Crafty (correct me if i'm wrong for
>>>>>latest version) isn't hashing in qsearch for example.
>>>>>
>>>>>> MOVE move_to_search;
>>>>>> int best=-INFINITY;
>>>>>> GenMoves();
>>>>>> while (GetNextMove(&move_to_search))
>>>>>> {
>>>>>>  PlayMove(move_to_search);
>>>>>>  val = -MTD(depth - 1, -guess + 1);
>>>>>
>>>>>If you just search with 1 bound, then it's obvious to do:
>>>>>  val = -MTD(depth - 1, -guess);
>>>>>
>>>>>>  UnPlayMove(move_to_search);
>>>>>>  if (val>best)
>>>>>>  {
>>>>>>   best=val;
>>>>>>   if (val>=guess)
>>>>>      break;
>>>>>>  }
>>>>>> }
>>>>>> return (best);
>>>>>>}
>>>>>>
>>>>>>perhaps there is something very wrong with this? or perhaps it's used already, I
>>>>>>just noticed that on Aske Plaat's site he always uses an Alpha-Beta search with
>>>>>>0 width windowed searches, but doesn't this do the same thing? Is using
>>>>>>fail-soft type algorithms used in MTD(f) since it could well help zoom into the
>>>>>>correct score sooner?
>>>>>>
>>>>>>Cheers
>>>>>>Tor



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.