Author: Don Beal
Date: 11:22:53 05/23/98
Go up one level in this thread
On May 23, 1998 at 04:38:19, Komputer Korner wrote: >[snipped] Thanks for your interest. You ask "why not do a 1-ply search?". The answer is that we wanted to perform the experiment with the deepest possible lookahead search that still allowed to us run tens of thousands of games on a desktop computer in a few weeks. The deeper the lookahead search, the better the quality of play given any particular set of piece values. The better the quality of play, the fewer games necessary, and the closer the values are likely to be to those chosen for competitive play. The goal of our experiment was not to re-invent piece values for the purpose of finding better ones for a competitive program, but to prove that the method worked, and satisfy ourselves that it was robust over a wide range of parameter settings (more than we reported in the paper). Within that goal we chose the deepest searches we were willing to wait for. I did not say the results were totally independent of depth. But beyond depth 3, the values obtained change only slightly as the search depth increases, but the CPU time per game goes up exponentially (of course). It is not easy to predict the results of such experiments. One non-obvious effect is that the TD learning process can in principle (and other things being equal) learn values that can only be deliberately exploited by searches deeper than those used in the self-play! This effect could arise because even if the play is poor (due to low search depth), a better position could still offer a greater probability that the right play will be found by accident. The TD learning process works by relating *positions* to outcomes and does not require that the search sees the win. Over a large enough number of trials, the learning could respond to a statistically significant number of "accidental" wins from superior positions. Hence it is quite possible the "low" knight value is more to do with lack of other evaluation terms than lack of search depth. Or even that the value "ought" to be low. I don't have enough information to guess the answer. Your guesses might be right. The only way to really know is to do more experiments. Don Beal. PS. To answer your other question: the program used in the matches was the same as used in the learning runs: four ply plus quiescence, with randomised choice from tactically-equal moves.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.