Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: MTD(f)

Author: Vincent Diepeveen

Date: 05:27:42 06/20/05

Go up one level in this thread


On June 18, 2005 at 22:48:38, Robert Hyatt wrote:

>On June 18, 2005 at 11:41:00, Vincent Diepeveen wrote:
>
>>On June 18, 2005 at 06:00:37, Vasik Rajlich wrote:
>>
>>>On June 17, 2005 at 23:16:10, Robert Hyatt wrote:
>>>
>>>>On June 17, 2005 at 18:53:53, Vincent Diepeveen wrote:
>>>>
>>>>>On June 16, 2005 at 18:33:11, Vasik Rajlich wrote:
>>>>>
>>>>>>On June 16, 2005 at 13:38:55, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On June 15, 2005 at 22:40:22, Tor Alexander Lattimore wrote:
>>>>>>>
>>>>>>>I wonder why there is still people using MTD in chessprograms above PVS.
>>>>>>>
>>>>>>>In PVS you can of course easily determine the maximum number of nodes you had to
>>>>>>>search extra on top of nullwindows. In DIEP it's just a couple of thousands at
>>>>>>>in total millions of moves.
>>>>>>>
>>>>>>>So the theoretic gain MTD CAN give, when you start measuring it, it is really
>>>>>>>negative.
>>>>>>>
>>>>>>
>>>>>>I agree now after some bad experiments that MTD (f) isn't worth the trouble, but
>>>>>>the refutation isn't quite this simple.
>>>>>>
>>>>>>Imagine white has two moves, x and y. x is weak, but you search it first with a
>>>>>>big window. The first response to x gives a score of -2.0, and now you search
>>>>>>all the other responses to x (with null-windows around -2.0), and after a big
>>>>>>search x is able to maintain the -2.0. Note that the vast majority of this
>>>>>>search is null-window.
>>>>>>Now finally you search y, and it gives you 0.0.
>>>>>>You could have skipped all (almost all) of those zero-window searches of x, if
>>>>>>only you had switched to y more quickly. That's what MTD (f) tries to do. ?>You're
>>>>>>not committing to a move. Basically a wide window commits you to the first move
>>>>>>you happen to try - you're going to search it anywhere inside that window.
>>>>>
>>>>>The basic flaw in the above argumentation is that it requires MTD to have
>>>>>somehow from the skies a magical sorting S which is a better one than PVS gives
>>>>>you.
>>>>>
>>>>>I directly bury that idea. MTD does *not* have a better move ordering than PVS
>>>>>gives you.
>>>>>
>>>>>The insight is just not there, let alone the proof, that MTD *can* give a better
>>>>>move ordering than PVS.
>>>>
>>>>Actually it does have better ordering.  Because you end up searching the _same_
>>>>set of moves, to the same search depth, repeatedly.  Two passes, one above the
>>>>true score and one below the true score, and your move ordering ends up _really_
>>>>good.
>>>>
>>>
>>>Let me put it like this: in MTD (f), if the first move you happen to try is
>>>weak, you will spend less time on it. You will just prove that it's worse than
>>>your first alpha, and continue to the next move. In PVS, you will have to show
>>>exactly where in the window it falls (or prove that it falls outside of the
>>>whole window).
>>
>>That is of course not the case. In PVS you directly start with a correct search
>>bound and MTD starts with the wrong bound
>starting with +/- infinity is not necessarily better than starting with -1.01,
>-1.00.  I can show you positions that have a zillion scores (wrong scores
>however) between -Mate99 and -Queen, yet the true score is +1.0.  Searching
>those with +/- infinity is murder, while mtd(f) won't get hung up in the wrong
>area.

If we are at 13 ply now and -1.010 we start 14 ply.
PVS gets mainline from hashtable to where score was -1.060, adds 1 ply to it
there then it gets at 1 ply depthleft returned the new bound of -1.110 and the
rest of the PVS search you have minimal window searches at -1.060

So with overhead of say 50 nodes or so you determined the root bound.

MTD needs to make 20 trips around the world before starting its search with a
bound of -1.060. PVS directly started with that bound.

50 nodes overhead is just nothing.

Of course it will be more overhead when you use random trees or no qsearch at
your program. I vaguely remember Aske Plaat using random searches more or less
without using a quiescencesearch to proof that MTD worked. Even in those days a
normal program would not get helped by MTD. He needed to cut the quiecensesearch
out of the program in order to prove that MTD was interesting to be tried.

Vincent

>>
>>For example PVS directly has from hashtable + 1 ply extra search a 0.231 score
>>(which will later become the PV score) and is all the time searching with a
>>window [0.231;0.232], so if moves fail low and fail high that won't screw your
>>move ordering.
>>
>>So the overhead for PVS to get a very accurate bound for this first root move is
>>exactly 50-100 nodes or so (exactly 1 ply on top of the previous mainline).
>>
>>In MTD you start for example with a first window 0.100;0.100 and besides that
>>you need to research like 25 times or so to get to that 0.231, a major problem
>>is that nonstop all kind of flips happen and the best move in your hashtable is
>>completely screwing your move ordering.
>>
>>MTD only can work at random trees of 30 line programs, which is not the case in
>>computerchess. We have all kind of move ordering heuristics and a genius
>>hashtable working for us.
>>
>>>You will see this behavior in any PVS engine - even a good one. At depth=12, it
>>>thinks move x is best with a score of 0.25. (Let's say.) At depth=13, move x
>>>becomes rather weak, and the engine goes on to show that x scores exactly -0.20
>>>before even considering any other move. This is potentially a big search. Then,
>>>the search continues on to move y, which is right back to +0.25 or so.
>>
>>Important in reality is that PVS directly has that fail low to -0.20 after which
>>we time extend. In MTD it takes ages *before* you have that fail low to -0.20.
>>
>>It is very cheap to get a mainline with PVS of that -0.20 if in MTD you first
>>start searching at 0.24;0.25 then at 0.23;0.24 then at 0.22;0.24 then obviously
>>the first few searches are way more expensive than that single PVS search to get
>>-0.20
>>
>>A huge score change MTD always was at its weakest historically, please don't
>>reverse the facts.
>>
>>The idea of MTD was that if you are now at 0.24 that it is pretty cheap to get
>>the next pv line if the score of it also is 0.24
>>
>>However thanks to having a hashtable nowadays that is equally cheap with PVS
>>too.
>>
>>Any huge score change MTD just pukes out and takes forever to get a root score
>>for your new move.
>>
>>Additional it is very unlikely if a move fails low from 0.25 to -0.20 now at 13
>>ply that there is any escape possible to a different move score 0.25. More
>>likely that move will have perhaps -0.05 or something.
>>
>>So in practice most MTD folks figured out that you NEED to first get some decent
>>estimate on how low this moves score is.
>>
>>Using some binary search of course defeats the 'cheapness' of MTD.
>>
>>When you just did a search of 0.24 that failed low, then now searching with a
>>bound 0.23 is cheaper of course than searching with a bound of 0.00
>>
>>So obviously you'll have to do a number of searches of 0.24 0.23 0.22 0.21 0.20
>>and so on. Only when say 20 consecutive tried with MTD failed you can try first
>>and go have a look whether some other move is ok, or you can try some PVS type
>>thing and start a PVS search to determine quickly the real score.
>>
>>So when PVS already is 1 ply deeper, then you will have figured out that the
>>score is -0.20 here.
>>
>>>This inefficiency is also repeated inside the tree, and it is what MTD (f) tries
>>>to avoid.
>>
>>MTD's worst case is there when the score drops some in the root.
>>
>>If you drop regurarly 200 points in the root then MTD will royally screw you
>>with its huge overhead.
>>
>>>I guess I see MTD (f) as a promising and logical bad idea, rather than a
>>>completely pointless one :)
>>
>>No serious researcher ever took MTD serious in this sense that it would work.
>>
>>That's probably why there was not a second paper which showed the results of MTD
>>versus PVS at a strong program, instead of just some random tree experiments
>>with a plydepth of 5 ply or so.
>>
>>>>Of course, that doesn't mean the final cumulative search tree is smaller than
>>>>PVS.  For me it never was.  But I'll give someone the doubt that they did a
>>>>better job of it than I did.  I only fooled with it for a couple of months
>>>>before tossing it.
>>>>
>>>>>
>>>>>>Anyway it's way too late at night to list all the problems with MTD (f) :)
>>>>>>Vas
>>>>>
>>>>>Of course MTD has another 10 problems why you don't want to use it. However the
>>>>>best insight we get in how outdated it is, is when looking to world champs 2004.
>>>>>
>>>>>There Stefan Meyer-Kahlen, someone who will be the last on the planet to do
>>>>>suggestions to others on how to improve their program, at the world champs 2004
>>>>>he very careful suggested to Rudolf Huber (ParSOS) that: "Perhaps it is time to
>>>>>consider dropping MTD and getting back some more decent search algorithm like
>>>>>PVS for example."
>>>>>
>>>>>I have never ever heard Stefan say such a clear suggestion to anyone on this
>>>>>planet like that.
>>>>>
>>>>>So perhaps...
>>>>>
>>>
>>>Most of the search experts don't like MTD (f). Tord has stopped using it, Fabien
>>>also never liked it. (A discussion from CCC IIRC ..)
>>
>>>As I see it, it has one huge flaw: when you come to a node, you don't know the
>>>"true" bounds (ie. the eventual range of values you'll come to that node with),
>>>and you can't use it to make search decisions.
>>
>>In chess what matters is not producing a bad move. MTD has a horrible worst case
>>when you fail low at a big search depth a point or 200. Please note 200 points
>>in DIEP is just 0.2 pawn so that happens regurarly.
>>
>>>The most simple example is null move. In every PVS engine I've seen, null move
>>>is required to fail high. Nobody allows null moves to score inside the window -
>>>ie. nobody will put null moves in the PVs. In MTD (f), though, if you do a null
>>>move at a node at the alpha value, and it fails high, you just potentially put a
>>>null move into your PV.
>>
>>The cache trashing latency overhead of MTD is too expensive for fast searching
>>programs. A searcher at 1 million nps or so can never use MTD. He'll lose 40% of
>>his system time just to memory trashing. MTD needs absolutely that you store
>>transposition table in qsearch.
>>
>>So especially at parallel machines it won't run above 8 cpu's EVER.
>>
>>I did some tests with SOS at a 64 processor itanium2 and it didn't scale above 8
>>cpu's simply. Scaling just stopped. As *that* was the maximum the memory
>>controllers could handle a second. So above 8 cpu's it just became slower rather
>>than faster. I actually tried up to 16 cpu's.
>>
>>>(This also happens in a PVS engine if a null move fails high, but a research is
>>>later needed.)
>>>
>>>Anyway, there are a number of possible solutions to these problems, but then you
>>>have to ask yourself if it's worth the trouble. If the final savings are 10% of
>>>nodes, is it worth dealing with these issues?
>>
>>The overhead of PVS is real real little.
>>
>>We can have endless discussions about pvs versus mtd, but until you start
>>realizing the small overhead of PVS (IF YOU HAVE A DECENT MOVE ORDERING AND A
>>GOOD FUNCTIONING HASHTABLE) then under that condition MTD will of course have 0%
>>chance, especially if you consider its huge drawbacks when you fail low 200
>>points or so.
>>
>>Vincent
>>
>>>Vas
>>>
>>>>>>>For material only searches or evaluation functions which have little fluctuation
>>>>>and a small range it's pretty fast though to use MTD, provided you don't mind to
>>>>>>>lose sometimes games when you hit its worse case.
>>>>>>>
>>>>>>>The worst case you can see in the program ParSOS very well at world champs,
>>>>>>>when it starts to doubt then it has a search depth of plies less to the others.
>>>>>>>
>>>>>>>Nevertheless here my answers:
>>>>>>>
>>>>>>>>Hi
>>>>>>>>Is it possible to use a single variable in the MTD(f) search? Something like
>>>>>>>>this:
>>>>>>>>
>>>>>>>>int MTD(int depth, int guess)
>>>>>>>>{
>>>>>>>> if (depth<1) return Evaluate();
>>>>>>>
>>>>>>>Using hashtable in qsearch of course is a necessity for MTD because of the many
>>>>>>>researches you're doing.
>>>>>>>
>>>>>>>So for very fast searching software MTD is pretty expensive to use, as the
>>>>>>>hashtable more and more gets a big overhead. Crafty (correct me if i'm wrong for
>>>>>>>latest version) isn't hashing in qsearch for example.
>>>>>>>
>>>>>>>> MOVE move_to_search;
>>>>>>>> int best=-INFINITY;
>>>>>>>> GenMoves();
>>>>>>>> while (GetNextMove(&move_to_search))
>>>>>>>> {
>>>>>>>>  PlayMove(move_to_search);
>>>>>>>>  val = -MTD(depth - 1, -guess + 1);
>>>>>>>
>>>>>>>If you just search with 1 bound, then it's obvious to do:
>>>>>>>  val = -MTD(depth - 1, -guess);
>>>>>>>
>>>>>>>>  UnPlayMove(move_to_search);
>>>>>>>>  if (val>best)
>>>>>>>>  {
>>>>>>>>   best=val;
>>>>>>>>   if (val>=guess)
>>>>>>>      break;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> return (best);
>>>>>>>>}
>>>>>>>>
>>>>>>>>perhaps there is something very wrong with this? or perhaps it's used already, I
>>>>>>>>just noticed that on Aske Plaat's site he always uses an Alpha-Beta search with
>>>>>>>>0 width windowed searches, but doesn't this do the same thing? Is using
>>>>>>>>fail-soft type algorithms used in MTD(f) since it could well help zoom into the
>>>>>>>>correct score sooner?
>>>>>>>>
>>>>>>>>Cheers
>>>>>>>>Tor



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.