Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Chess programming: A statistical approach

Author: Dann Corbit

Date: 11:54:10 04/05/04

Go up one level in this thread


On April 05, 2004 at 14:22:43, Dann Corbit wrote:

>On April 05, 2004 at 08:26:08, Andrew Wagner wrote:
>
>>
>>I'm taking this intro to calc-based statistics class this semester, and it got
>>me thinking. We tend to generate a lot of statistics during a game. In fact, we
>>could generate a LOT more. But we do very simple things with these statistics.
>>For example, we might add some bell or whistle to the search, notice that it
>>doesn't show any obvious increase in NPS or move ordering, or whatever we were
>>hoping for, and ditch it. That's an extremely simple analysis of the statistics.
>>I'm wondering if anybody has done any more advanced statistical work in figuring
>>out what factors are important to winning a game. Here are two examples I
>>thought of:
>>
>>    1.) Using advanced statistics to calculate better values to be used in
>>static eval. What if we took a ton of super-gm games, and generated a slew of
>>statistics about each game like number of moves white has an isolated pawn on,
>>or number of moves black keeps his king-knight, and so on. Then do a
>>multi-variate analysis to determine whether any of those things have an effect
>>on the probability that white or black will win. Unfortunately, this math is a
>>little above me at this point, and maybe someone else has tried it, I don't
>>know. It's an interesting idea to think about, though.
>>
>>    2.) Some quantitative analysis of the speed vs. knowledge question. We have
>>the speed number readily available, in the form of NPS. If we could find some
>>way to quantify the amount of knowledge we build into an engine, we could easily
>>find some kind of ideal balance. Think about it: this kind of thing is done all
>>the time in many fields. Mathematically, this kind of optimization should not be
>>all that hard, if we can come up with good numbers to use.
>>
>>     So, what do you think? Anybody heard of such a project before? Do you all
>>think it would work? I'd be interested in any thoughts you have. Andrew
>
>I have code that does a parabolic least squares fit for evaluation parameters.
>You can tune it to do anything that way, but the problem is --> that will be
>what it is good at.  So (for instance) I can tune it to test suites and it will
>slaughter them, but in real games it takes a pounding.

#include <math.h>
#include <float.h>

/***************************************************
*  Parabolic Least Squares Demonstration Program   *
* ------------------------------------------------ *
* Reference: BASIC Scientific Subroutines, Vol. II *
*   by F.R. Ruckdeschel, BYTE/McGRAWW-HILL, 1981.  *
*   ISBN 0-07-054202-3 (v. 2), Pages 24-31         *
*                                                  *
*              C++ version by J-P Moreau, Paris    *
* Subsequently translated to C and heavily altered *
* by Dann Corbit                                   *
* ------------------------------------------------ *
* This program calculates a parabolic least squares*
* fit to a given data set.                         *
*                                                  *
* INSTRUCTIONS                                     *
* ------------                                     *
* 1.  The number of data coordinates provided must *
*     be greater than three points.                *
* 2.  The data must not be a single point, which   *
*     is repeated.                                 *
* 3.  The data must not be perfectly colinear.     *
* If any of these 3 rules are violated, an error   *
* flag will be set.  The returned data will be     *
* invalid and must not be used.                    *
*                                                  *
* SAMPLE DATA:                                     *
*                                                  *
* Number of data points : 4                        *
*                                                  *
*   X[0]=1  Y[0]=1                                 *
*   X[1]=2  Y[1]=4                                 *
*   X[2]=3  Y[2]=9                                 *
*   X[3]=5  Y[3]=24.95                             *
*                                                  *
* Fitted equation is:                              *
*   Y = -0.017727 + 0.022045 X + 0.994318 X^2      *
*                                                  *
* Standard deviation of fit: 0.004767              *
***************************************************/
#define DATA_OK         0 // Everything is A-OK

#define NOT_ENOUGH_DATA 1 // Underdetermined system

#define SINGLE_POINT    2 // Degenerate data

#define STRAIGHT_LINE   3 // Degenerate data

 /****************************************************************
 *           Parabolic least squares estimation                  *
 * ------------------------------------------------------------- *
 * In:  unsigned n = number of points                            *
 *      n values xd[i], yd[i]                                    *
 * Out: coefficients a,b,c of fit (a+b*x+c*x^2)                  *
 *      coefs[0] = multiplier for x^0 {Constant term}            *
 *      coefs[1] = multiplier for x^1 {Linear term}              *
 *      coefs[2] = multiplier for x^2 {Quadratic term}           *
 *      coefs[3] = Standard deviation {How good is the fit?}     *
 *      coefs[4] = x location for the extrapolcated minimum      *
 * Returns: The location of the minimum of the parabola at x.    *
 ****************************************************************/

double          plsqf(double *xd, double *yd, unsigned n, double *coefs, int
*error)
{
    double          a0;
    double          a1;
    double          a2,
                    a3,
                    a4,
                    b0,
                    b1,
                    b2,
                    d1;
    unsigned        i;
    double          a,
                    b,
                    c,
                    d,
                    e;
    *error = DATA_OK;           // Assume that there are no problems

    // Check for not enough data...
    if (n < 3) {
        *error = NOT_ENOUGH_DATA;
        return 0;
    }
    a0 = 1;
    a1 = 0;
    a2 = 0;
    a3 = 0;
    a4 = 0;
    b0 = 0;
    b1 = 0;
    b2 = 0;
    for (i = 0; i < n; i++) {
        double          xi2 = xd[i] * xd[i];
        double          xi4 = xi2 * xi2;
        double          xy = xd[i] * yd[i];
        a1 += xd[i];
        a2 += xi2;
        a3 += xi2 * xd[i];
        a4 += xi4;
        b0 += yd[i];
        b1 += xy;
        b2 += xy * xd[i];
    }
    a1 /= n;
    a2 /= n;
    a3 /= n;
    a4 /= n;
    b0 /= n;
    b1 /= n;
    b2 /= n;
    d = a0 * (a2 * a4 - a3 * a3) - a1 * (a1 * a4 - a2 * a3) + a2 * (a1 * a3 - a2
* a2);

    // Check for {near} singularity (all data is the same point)
    if (fabs(d) < (DBL_EPSILON * 10.0)) {
        *error = SINGLE_POINT;
        return 0;
    }
    a = (b0 * (a2 * a4 - a3 * a3) + b1 * (a2 * a3 - a1 * a4) + b2 * (a1 * a3 -
a2 * a2)) / d;
    b = (b0 * (a2 * a3 - a1 * a4) + b1 * (a0 * a4 - a2 * a2) + b2 * (a1 * a2 -
a0 * a3)) / d;
    c = (b0 * (a1 * a3 - a2 * a2) + b1 * (a2 * a1 - a0 * a3) + b2 * (a0 * a2 -
a1 * a1)) / d;

    // Check for {near} singularity (the data is perfectly linear)
    if (fabs(c) < (DBL_EPSILON * 10.0)) {
        *error = STRAIGHT_LINE;
        return 0;
    }
    // Evaluation of standard deviation d
    d = 0;
    for (i = 0; i < n; i++) {
        d1 = yd[i] - a - b * xd[i] - c * xd[i] * xd[i];
        d += d1 * d1;
    }
    d = sqrt(d / (n - 3.0));

    // Calculation of the minimum for x
    e = -b / (c * 2.0);

    // Load the constants into the return array
    coefs[0] = a;
    coefs[1] = b;
    coefs[2] = c;
    coefs[3] = d;
    coefs[4] = e;
    return e;                   // Return the x location for the minimum
}



void            iterate_pers(void)
{
    double          x[40],
                    y[40];
    double          coefs[5];
    int             error;
    double          ans;
    unsigned        count;
    int             depth;
    FILE           *out;
    init_personality();
    for (depth = 4; depth <= 10; depth++) {

        which_param = 0;        /* change to zero to do them all... */
next:
        count = 0;
        ph = phunk_list[which_param];
        if (which_param >= 4)
            goto loop;
        if (ph.p) {
            which_delta = (ph.maxval - ph.minval) / 6;
            if (which_delta == 0) {
                which_param++;
                goto next;
            }
            out = fopen("calibration.dat", "a");
            if (out == NULL)
                out = stderr;
            ph.ploc[0] = ph.minval;
            while (ph.maxval >= ph.ploc[0]) {
                ph.ploc[0] += which_delta;
                x[count] = ph.ploc[0];
                y[count] = do_test(depth);
                putchar('!');
                count++;
            }
            ans = plsqf(x, y, count, coefs, &error);
            if (error == 0) {
                ans = ans + 0.5;
                if (ans > ph.maxval)
                    ans = ph.maxval;
                if (ans < ph.minval)
                    ans = ph.minval;

                ph.ploc[0] = ans;
                fprintf(out, "%s=%d at %d; stddev=%f\n", ph.p, ph.ploc[0],
depth, coefs[3]);
                printf("\n%s=%d at %d; stddev=%f\n", ph.p, ph.ploc[0], depth,
coefs[3]);
                fflush(out);
            } else {
                int             i;
                double          miny,
                                minx;
                printf("\nBad data this pass -- no fit. (%s): ", ph.p);
                miny = y[0];
                minx = x[0];
                for (i = 0; i < count; i++) {
                    printf("(x=%f, y=%f), ", x[i], y[i]);
                    if (y[i] < miny) {
                        miny = y[i];
                        minx = x[i];
                    }
                }
                ph.ploc[0] = minx;
                fprintf(out, "%s=%d at %d; {badfit}\n", ph.p, ph.ploc[0],
depth);
                printf("\n%s=%d at %d; {badfit}\n", ph.p, ph.ploc[0], depth);

                putchar('\n');
            }
            fclose(out);
            which_param++;

            goto next;
        }
loop:   ;
    }

}

This goes into the parser:

        if (!strcmp(first, "CALIBRATE")) {
            noisy = 0;
            iterate_pers();
            noisy = 1;
            return S_NOP;
        }

The variable noisy turns off printing of output analysis if zero and on if 1.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.