Author: Vincent Diepeveen
Date: 05:27:42 06/20/05
Go up one level in this thread
On June 18, 2005 at 22:48:38, Robert Hyatt wrote: >On June 18, 2005 at 11:41:00, Vincent Diepeveen wrote: > >>On June 18, 2005 at 06:00:37, Vasik Rajlich wrote: >> >>>On June 17, 2005 at 23:16:10, Robert Hyatt wrote: >>> >>>>On June 17, 2005 at 18:53:53, Vincent Diepeveen wrote: >>>> >>>>>On June 16, 2005 at 18:33:11, Vasik Rajlich wrote: >>>>> >>>>>>On June 16, 2005 at 13:38:55, Vincent Diepeveen wrote: >>>>>> >>>>>>>On June 15, 2005 at 22:40:22, Tor Alexander Lattimore wrote: >>>>>>> >>>>>>>I wonder why there is still people using MTD in chessprograms above PVS. >>>>>>> >>>>>>>In PVS you can of course easily determine the maximum number of nodes you had to >>>>>>>search extra on top of nullwindows. In DIEP it's just a couple of thousands at >>>>>>>in total millions of moves. >>>>>>> >>>>>>>So the theoretic gain MTD CAN give, when you start measuring it, it is really >>>>>>>negative. >>>>>>> >>>>>> >>>>>>I agree now after some bad experiments that MTD (f) isn't worth the trouble, but >>>>>>the refutation isn't quite this simple. >>>>>> >>>>>>Imagine white has two moves, x and y. x is weak, but you search it first with a >>>>>>big window. The first response to x gives a score of -2.0, and now you search >>>>>>all the other responses to x (with null-windows around -2.0), and after a big >>>>>>search x is able to maintain the -2.0. Note that the vast majority of this >>>>>>search is null-window. >>>>>>Now finally you search y, and it gives you 0.0. >>>>>>You could have skipped all (almost all) of those zero-window searches of x, if >>>>>>only you had switched to y more quickly. That's what MTD (f) tries to do. ?>You're >>>>>>not committing to a move. Basically a wide window commits you to the first move >>>>>>you happen to try - you're going to search it anywhere inside that window. >>>>> >>>>>The basic flaw in the above argumentation is that it requires MTD to have >>>>>somehow from the skies a magical sorting S which is a better one than PVS gives >>>>>you. >>>>> >>>>>I directly bury that idea. MTD does *not* have a better move ordering than PVS >>>>>gives you. >>>>> >>>>>The insight is just not there, let alone the proof, that MTD *can* give a better >>>>>move ordering than PVS. >>>> >>>>Actually it does have better ordering. Because you end up searching the _same_ >>>>set of moves, to the same search depth, repeatedly. Two passes, one above the >>>>true score and one below the true score, and your move ordering ends up _really_ >>>>good. >>>> >>> >>>Let me put it like this: in MTD (f), if the first move you happen to try is >>>weak, you will spend less time on it. You will just prove that it's worse than >>>your first alpha, and continue to the next move. In PVS, you will have to show >>>exactly where in the window it falls (or prove that it falls outside of the >>>whole window). >> >>That is of course not the case. In PVS you directly start with a correct search >>bound and MTD starts with the wrong bound >starting with +/- infinity is not necessarily better than starting with -1.01, >-1.00. I can show you positions that have a zillion scores (wrong scores >however) between -Mate99 and -Queen, yet the true score is +1.0. Searching >those with +/- infinity is murder, while mtd(f) won't get hung up in the wrong >area. If we are at 13 ply now and -1.010 we start 14 ply. PVS gets mainline from hashtable to where score was -1.060, adds 1 ply to it there then it gets at 1 ply depthleft returned the new bound of -1.110 and the rest of the PVS search you have minimal window searches at -1.060 So with overhead of say 50 nodes or so you determined the root bound. MTD needs to make 20 trips around the world before starting its search with a bound of -1.060. PVS directly started with that bound. 50 nodes overhead is just nothing. Of course it will be more overhead when you use random trees or no qsearch at your program. I vaguely remember Aske Plaat using random searches more or less without using a quiescencesearch to proof that MTD worked. Even in those days a normal program would not get helped by MTD. He needed to cut the quiecensesearch out of the program in order to prove that MTD was interesting to be tried. Vincent >> >>For example PVS directly has from hashtable + 1 ply extra search a 0.231 score >>(which will later become the PV score) and is all the time searching with a >>window [0.231;0.232], so if moves fail low and fail high that won't screw your >>move ordering. >> >>So the overhead for PVS to get a very accurate bound for this first root move is >>exactly 50-100 nodes or so (exactly 1 ply on top of the previous mainline). >> >>In MTD you start for example with a first window 0.100;0.100 and besides that >>you need to research like 25 times or so to get to that 0.231, a major problem >>is that nonstop all kind of flips happen and the best move in your hashtable is >>completely screwing your move ordering. >> >>MTD only can work at random trees of 30 line programs, which is not the case in >>computerchess. We have all kind of move ordering heuristics and a genius >>hashtable working for us. >> >>>You will see this behavior in any PVS engine - even a good one. At depth=12, it >>>thinks move x is best with a score of 0.25. (Let's say.) At depth=13, move x >>>becomes rather weak, and the engine goes on to show that x scores exactly -0.20 >>>before even considering any other move. This is potentially a big search. Then, >>>the search continues on to move y, which is right back to +0.25 or so. >> >>Important in reality is that PVS directly has that fail low to -0.20 after which >>we time extend. In MTD it takes ages *before* you have that fail low to -0.20. >> >>It is very cheap to get a mainline with PVS of that -0.20 if in MTD you first >>start searching at 0.24;0.25 then at 0.23;0.24 then at 0.22;0.24 then obviously >>the first few searches are way more expensive than that single PVS search to get >>-0.20 >> >>A huge score change MTD always was at its weakest historically, please don't >>reverse the facts. >> >>The idea of MTD was that if you are now at 0.24 that it is pretty cheap to get >>the next pv line if the score of it also is 0.24 >> >>However thanks to having a hashtable nowadays that is equally cheap with PVS >>too. >> >>Any huge score change MTD just pukes out and takes forever to get a root score >>for your new move. >> >>Additional it is very unlikely if a move fails low from 0.25 to -0.20 now at 13 >>ply that there is any escape possible to a different move score 0.25. More >>likely that move will have perhaps -0.05 or something. >> >>So in practice most MTD folks figured out that you NEED to first get some decent >>estimate on how low this moves score is. >> >>Using some binary search of course defeats the 'cheapness' of MTD. >> >>When you just did a search of 0.24 that failed low, then now searching with a >>bound 0.23 is cheaper of course than searching with a bound of 0.00 >> >>So obviously you'll have to do a number of searches of 0.24 0.23 0.22 0.21 0.20 >>and so on. Only when say 20 consecutive tried with MTD failed you can try first >>and go have a look whether some other move is ok, or you can try some PVS type >>thing and start a PVS search to determine quickly the real score. >> >>So when PVS already is 1 ply deeper, then you will have figured out that the >>score is -0.20 here. >> >>>This inefficiency is also repeated inside the tree, and it is what MTD (f) tries >>>to avoid. >> >>MTD's worst case is there when the score drops some in the root. >> >>If you drop regurarly 200 points in the root then MTD will royally screw you >>with its huge overhead. >> >>>I guess I see MTD (f) as a promising and logical bad idea, rather than a >>>completely pointless one :) >> >>No serious researcher ever took MTD serious in this sense that it would work. >> >>That's probably why there was not a second paper which showed the results of MTD >>versus PVS at a strong program, instead of just some random tree experiments >>with a plydepth of 5 ply or so. >> >>>>Of course, that doesn't mean the final cumulative search tree is smaller than >>>>PVS. For me it never was. But I'll give someone the doubt that they did a >>>>better job of it than I did. I only fooled with it for a couple of months >>>>before tossing it. >>>> >>>>> >>>>>>Anyway it's way too late at night to list all the problems with MTD (f) :) >>>>>>Vas >>>>> >>>>>Of course MTD has another 10 problems why you don't want to use it. However the >>>>>best insight we get in how outdated it is, is when looking to world champs 2004. >>>>> >>>>>There Stefan Meyer-Kahlen, someone who will be the last on the planet to do >>>>>suggestions to others on how to improve their program, at the world champs 2004 >>>>>he very careful suggested to Rudolf Huber (ParSOS) that: "Perhaps it is time to >>>>>consider dropping MTD and getting back some more decent search algorithm like >>>>>PVS for example." >>>>> >>>>>I have never ever heard Stefan say such a clear suggestion to anyone on this >>>>>planet like that. >>>>> >>>>>So perhaps... >>>>> >>> >>>Most of the search experts don't like MTD (f). Tord has stopped using it, Fabien >>>also never liked it. (A discussion from CCC IIRC ..) >> >>>As I see it, it has one huge flaw: when you come to a node, you don't know the >>>"true" bounds (ie. the eventual range of values you'll come to that node with), >>>and you can't use it to make search decisions. >> >>In chess what matters is not producing a bad move. MTD has a horrible worst case >>when you fail low at a big search depth a point or 200. Please note 200 points >>in DIEP is just 0.2 pawn so that happens regurarly. >> >>>The most simple example is null move. In every PVS engine I've seen, null move >>>is required to fail high. Nobody allows null moves to score inside the window - >>>ie. nobody will put null moves in the PVs. In MTD (f), though, if you do a null >>>move at a node at the alpha value, and it fails high, you just potentially put a >>>null move into your PV. >> >>The cache trashing latency overhead of MTD is too expensive for fast searching >>programs. A searcher at 1 million nps or so can never use MTD. He'll lose 40% of >>his system time just to memory trashing. MTD needs absolutely that you store >>transposition table in qsearch. >> >>So especially at parallel machines it won't run above 8 cpu's EVER. >> >>I did some tests with SOS at a 64 processor itanium2 and it didn't scale above 8 >>cpu's simply. Scaling just stopped. As *that* was the maximum the memory >>controllers could handle a second. So above 8 cpu's it just became slower rather >>than faster. I actually tried up to 16 cpu's. >> >>>(This also happens in a PVS engine if a null move fails high, but a research is >>>later needed.) >>> >>>Anyway, there are a number of possible solutions to these problems, but then you >>>have to ask yourself if it's worth the trouble. If the final savings are 10% of >>>nodes, is it worth dealing with these issues? >> >>The overhead of PVS is real real little. >> >>We can have endless discussions about pvs versus mtd, but until you start >>realizing the small overhead of PVS (IF YOU HAVE A DECENT MOVE ORDERING AND A >>GOOD FUNCTIONING HASHTABLE) then under that condition MTD will of course have 0% >>chance, especially if you consider its huge drawbacks when you fail low 200 >>points or so. >> >>Vincent >> >>>Vas >>> >>>>>>>For material only searches or evaluation functions which have little fluctuation >>>>>and a small range it's pretty fast though to use MTD, provided you don't mind to >>>>>>>lose sometimes games when you hit its worse case. >>>>>>> >>>>>>>The worst case you can see in the program ParSOS very well at world champs, >>>>>>>when it starts to doubt then it has a search depth of plies less to the others. >>>>>>> >>>>>>>Nevertheless here my answers: >>>>>>> >>>>>>>>Hi >>>>>>>>Is it possible to use a single variable in the MTD(f) search? Something like >>>>>>>>this: >>>>>>>> >>>>>>>>int MTD(int depth, int guess) >>>>>>>>{ >>>>>>>> if (depth<1) return Evaluate(); >>>>>>> >>>>>>>Using hashtable in qsearch of course is a necessity for MTD because of the many >>>>>>>researches you're doing. >>>>>>> >>>>>>>So for very fast searching software MTD is pretty expensive to use, as the >>>>>>>hashtable more and more gets a big overhead. Crafty (correct me if i'm wrong for >>>>>>>latest version) isn't hashing in qsearch for example. >>>>>>> >>>>>>>> MOVE move_to_search; >>>>>>>> int best=-INFINITY; >>>>>>>> GenMoves(); >>>>>>>> while (GetNextMove(&move_to_search)) >>>>>>>> { >>>>>>>> PlayMove(move_to_search); >>>>>>>> val = -MTD(depth - 1, -guess + 1); >>>>>>> >>>>>>>If you just search with 1 bound, then it's obvious to do: >>>>>>> val = -MTD(depth - 1, -guess); >>>>>>> >>>>>>>> UnPlayMove(move_to_search); >>>>>>>> if (val>best) >>>>>>>> { >>>>>>>> best=val; >>>>>>>> if (val>=guess) >>>>>>> break; >>>>>>>> } >>>>>>>> } >>>>>>>> return (best); >>>>>>>>} >>>>>>>> >>>>>>>>perhaps there is something very wrong with this? or perhaps it's used already, I >>>>>>>>just noticed that on Aske Plaat's site he always uses an Alpha-Beta search with >>>>>>>>0 width windowed searches, but doesn't this do the same thing? Is using >>>>>>>>fail-soft type algorithms used in MTD(f) since it could well help zoom into the >>>>>>>>correct score sooner? >>>>>>>> >>>>>>>>Cheers >>>>>>>>Tor
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.