Author: Dann Corbit
Date: 11:54:10 04/05/04
Go up one level in this thread
On April 05, 2004 at 14:22:43, Dann Corbit wrote: >On April 05, 2004 at 08:26:08, Andrew Wagner wrote: > >> >>I'm taking this intro to calc-based statistics class this semester, and it got >>me thinking. We tend to generate a lot of statistics during a game. In fact, we >>could generate a LOT more. But we do very simple things with these statistics. >>For example, we might add some bell or whistle to the search, notice that it >>doesn't show any obvious increase in NPS or move ordering, or whatever we were >>hoping for, and ditch it. That's an extremely simple analysis of the statistics. >>I'm wondering if anybody has done any more advanced statistical work in figuring >>out what factors are important to winning a game. Here are two examples I >>thought of: >> >> 1.) Using advanced statistics to calculate better values to be used in >>static eval. What if we took a ton of super-gm games, and generated a slew of >>statistics about each game like number of moves white has an isolated pawn on, >>or number of moves black keeps his king-knight, and so on. Then do a >>multi-variate analysis to determine whether any of those things have an effect >>on the probability that white or black will win. Unfortunately, this math is a >>little above me at this point, and maybe someone else has tried it, I don't >>know. It's an interesting idea to think about, though. >> >> 2.) Some quantitative analysis of the speed vs. knowledge question. We have >>the speed number readily available, in the form of NPS. If we could find some >>way to quantify the amount of knowledge we build into an engine, we could easily >>find some kind of ideal balance. Think about it: this kind of thing is done all >>the time in many fields. Mathematically, this kind of optimization should not be >>all that hard, if we can come up with good numbers to use. >> >> So, what do you think? Anybody heard of such a project before? Do you all >>think it would work? I'd be interested in any thoughts you have. Andrew > >I have code that does a parabolic least squares fit for evaluation parameters. >You can tune it to do anything that way, but the problem is --> that will be >what it is good at. So (for instance) I can tune it to test suites and it will >slaughter them, but in real games it takes a pounding. #include <math.h> #include <float.h> /*************************************************** * Parabolic Least Squares Demonstration Program * * ------------------------------------------------ * * Reference: BASIC Scientific Subroutines, Vol. II * * by F.R. Ruckdeschel, BYTE/McGRAWW-HILL, 1981. * * ISBN 0-07-054202-3 (v. 2), Pages 24-31 * * * * C++ version by J-P Moreau, Paris * * Subsequently translated to C and heavily altered * * by Dann Corbit * * ------------------------------------------------ * * This program calculates a parabolic least squares* * fit to a given data set. * * * * INSTRUCTIONS * * ------------ * * 1. The number of data coordinates provided must * * be greater than three points. * * 2. The data must not be a single point, which * * is repeated. * * 3. The data must not be perfectly colinear. * * If any of these 3 rules are violated, an error * * flag will be set. The returned data will be * * invalid and must not be used. * * * * SAMPLE DATA: * * * * Number of data points : 4 * * * * X[0]=1 Y[0]=1 * * X[1]=2 Y[1]=4 * * X[2]=3 Y[2]=9 * * X[3]=5 Y[3]=24.95 * * * * Fitted equation is: * * Y = -0.017727 + 0.022045 X + 0.994318 X^2 * * * * Standard deviation of fit: 0.004767 * ***************************************************/ #define DATA_OK 0 // Everything is A-OK #define NOT_ENOUGH_DATA 1 // Underdetermined system #define SINGLE_POINT 2 // Degenerate data #define STRAIGHT_LINE 3 // Degenerate data /**************************************************************** * Parabolic least squares estimation * * ------------------------------------------------------------- * * In: unsigned n = number of points * * n values xd[i], yd[i] * * Out: coefficients a,b,c of fit (a+b*x+c*x^2) * * coefs[0] = multiplier for x^0 {Constant term} * * coefs[1] = multiplier for x^1 {Linear term} * * coefs[2] = multiplier for x^2 {Quadratic term} * * coefs[3] = Standard deviation {How good is the fit?} * * coefs[4] = x location for the extrapolcated minimum * * Returns: The location of the minimum of the parabola at x. * ****************************************************************/ double plsqf(double *xd, double *yd, unsigned n, double *coefs, int *error) { double a0; double a1; double a2, a3, a4, b0, b1, b2, d1; unsigned i; double a, b, c, d, e; *error = DATA_OK; // Assume that there are no problems // Check for not enough data... if (n < 3) { *error = NOT_ENOUGH_DATA; return 0; } a0 = 1; a1 = 0; a2 = 0; a3 = 0; a4 = 0; b0 = 0; b1 = 0; b2 = 0; for (i = 0; i < n; i++) { double xi2 = xd[i] * xd[i]; double xi4 = xi2 * xi2; double xy = xd[i] * yd[i]; a1 += xd[i]; a2 += xi2; a3 += xi2 * xd[i]; a4 += xi4; b0 += yd[i]; b1 += xy; b2 += xy * xd[i]; } a1 /= n; a2 /= n; a3 /= n; a4 /= n; b0 /= n; b1 /= n; b2 /= n; d = a0 * (a2 * a4 - a3 * a3) - a1 * (a1 * a4 - a2 * a3) + a2 * (a1 * a3 - a2 * a2); // Check for {near} singularity (all data is the same point) if (fabs(d) < (DBL_EPSILON * 10.0)) { *error = SINGLE_POINT; return 0; } a = (b0 * (a2 * a4 - a3 * a3) + b1 * (a2 * a3 - a1 * a4) + b2 * (a1 * a3 - a2 * a2)) / d; b = (b0 * (a2 * a3 - a1 * a4) + b1 * (a0 * a4 - a2 * a2) + b2 * (a1 * a2 - a0 * a3)) / d; c = (b0 * (a1 * a3 - a2 * a2) + b1 * (a2 * a1 - a0 * a3) + b2 * (a0 * a2 - a1 * a1)) / d; // Check for {near} singularity (the data is perfectly linear) if (fabs(c) < (DBL_EPSILON * 10.0)) { *error = STRAIGHT_LINE; return 0; } // Evaluation of standard deviation d d = 0; for (i = 0; i < n; i++) { d1 = yd[i] - a - b * xd[i] - c * xd[i] * xd[i]; d += d1 * d1; } d = sqrt(d / (n - 3.0)); // Calculation of the minimum for x e = -b / (c * 2.0); // Load the constants into the return array coefs[0] = a; coefs[1] = b; coefs[2] = c; coefs[3] = d; coefs[4] = e; return e; // Return the x location for the minimum } void iterate_pers(void) { double x[40], y[40]; double coefs[5]; int error; double ans; unsigned count; int depth; FILE *out; init_personality(); for (depth = 4; depth <= 10; depth++) { which_param = 0; /* change to zero to do them all... */ next: count = 0; ph = phunk_list[which_param]; if (which_param >= 4) goto loop; if (ph.p) { which_delta = (ph.maxval - ph.minval) / 6; if (which_delta == 0) { which_param++; goto next; } out = fopen("calibration.dat", "a"); if (out == NULL) out = stderr; ph.ploc[0] = ph.minval; while (ph.maxval >= ph.ploc[0]) { ph.ploc[0] += which_delta; x[count] = ph.ploc[0]; y[count] = do_test(depth); putchar('!'); count++; } ans = plsqf(x, y, count, coefs, &error); if (error == 0) { ans = ans + 0.5; if (ans > ph.maxval) ans = ph.maxval; if (ans < ph.minval) ans = ph.minval; ph.ploc[0] = ans; fprintf(out, "%s=%d at %d; stddev=%f\n", ph.p, ph.ploc[0], depth, coefs[3]); printf("\n%s=%d at %d; stddev=%f\n", ph.p, ph.ploc[0], depth, coefs[3]); fflush(out); } else { int i; double miny, minx; printf("\nBad data this pass -- no fit. (%s): ", ph.p); miny = y[0]; minx = x[0]; for (i = 0; i < count; i++) { printf("(x=%f, y=%f), ", x[i], y[i]); if (y[i] < miny) { miny = y[i]; minx = x[i]; } } ph.ploc[0] = minx; fprintf(out, "%s=%d at %d; {badfit}\n", ph.p, ph.ploc[0], depth); printf("\n%s=%d at %d; {badfit}\n", ph.p, ph.ploc[0], depth); putchar('\n'); } fclose(out); which_param++; goto next; } loop: ; } } This goes into the parser: if (!strcmp(first, "CALIBRATE")) { noisy = 0; iterate_pers(); noisy = 1; return S_NOP; } The variable noisy turns off printing of output analysis if zero and on if 1.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.