Author: Dann Corbit
Date: 11:54:10 04/05/04
Go up one level in this thread
On April 05, 2004 at 14:22:43, Dann Corbit wrote:
>On April 05, 2004 at 08:26:08, Andrew Wagner wrote:
>
>>
>>I'm taking this intro to calc-based statistics class this semester, and it got
>>me thinking. We tend to generate a lot of statistics during a game. In fact, we
>>could generate a LOT more. But we do very simple things with these statistics.
>>For example, we might add some bell or whistle to the search, notice that it
>>doesn't show any obvious increase in NPS or move ordering, or whatever we were
>>hoping for, and ditch it. That's an extremely simple analysis of the statistics.
>>I'm wondering if anybody has done any more advanced statistical work in figuring
>>out what factors are important to winning a game. Here are two examples I
>>thought of:
>>
>> 1.) Using advanced statistics to calculate better values to be used in
>>static eval. What if we took a ton of super-gm games, and generated a slew of
>>statistics about each game like number of moves white has an isolated pawn on,
>>or number of moves black keeps his king-knight, and so on. Then do a
>>multi-variate analysis to determine whether any of those things have an effect
>>on the probability that white or black will win. Unfortunately, this math is a
>>little above me at this point, and maybe someone else has tried it, I don't
>>know. It's an interesting idea to think about, though.
>>
>> 2.) Some quantitative analysis of the speed vs. knowledge question. We have
>>the speed number readily available, in the form of NPS. If we could find some
>>way to quantify the amount of knowledge we build into an engine, we could easily
>>find some kind of ideal balance. Think about it: this kind of thing is done all
>>the time in many fields. Mathematically, this kind of optimization should not be
>>all that hard, if we can come up with good numbers to use.
>>
>> So, what do you think? Anybody heard of such a project before? Do you all
>>think it would work? I'd be interested in any thoughts you have. Andrew
>
>I have code that does a parabolic least squares fit for evaluation parameters.
>You can tune it to do anything that way, but the problem is --> that will be
>what it is good at. So (for instance) I can tune it to test suites and it will
>slaughter them, but in real games it takes a pounding.
#include <math.h>
#include <float.h>
/***************************************************
* Parabolic Least Squares Demonstration Program *
* ------------------------------------------------ *
* Reference: BASIC Scientific Subroutines, Vol. II *
* by F.R. Ruckdeschel, BYTE/McGRAWW-HILL, 1981. *
* ISBN 0-07-054202-3 (v. 2), Pages 24-31 *
* *
* C++ version by J-P Moreau, Paris *
* Subsequently translated to C and heavily altered *
* by Dann Corbit *
* ------------------------------------------------ *
* This program calculates a parabolic least squares*
* fit to a given data set. *
* *
* INSTRUCTIONS *
* ------------ *
* 1. The number of data coordinates provided must *
* be greater than three points. *
* 2. The data must not be a single point, which *
* is repeated. *
* 3. The data must not be perfectly colinear. *
* If any of these 3 rules are violated, an error *
* flag will be set. The returned data will be *
* invalid and must not be used. *
* *
* SAMPLE DATA: *
* *
* Number of data points : 4 *
* *
* X[0]=1 Y[0]=1 *
* X[1]=2 Y[1]=4 *
* X[2]=3 Y[2]=9 *
* X[3]=5 Y[3]=24.95 *
* *
* Fitted equation is: *
* Y = -0.017727 + 0.022045 X + 0.994318 X^2 *
* *
* Standard deviation of fit: 0.004767 *
***************************************************/
#define DATA_OK 0 // Everything is A-OK
#define NOT_ENOUGH_DATA 1 // Underdetermined system
#define SINGLE_POINT 2 // Degenerate data
#define STRAIGHT_LINE 3 // Degenerate data
/****************************************************************
* Parabolic least squares estimation *
* ------------------------------------------------------------- *
* In: unsigned n = number of points *
* n values xd[i], yd[i] *
* Out: coefficients a,b,c of fit (a+b*x+c*x^2) *
* coefs[0] = multiplier for x^0 {Constant term} *
* coefs[1] = multiplier for x^1 {Linear term} *
* coefs[2] = multiplier for x^2 {Quadratic term} *
* coefs[3] = Standard deviation {How good is the fit?} *
* coefs[4] = x location for the extrapolcated minimum *
* Returns: The location of the minimum of the parabola at x. *
****************************************************************/
double plsqf(double *xd, double *yd, unsigned n, double *coefs, int
*error)
{
double a0;
double a1;
double a2,
a3,
a4,
b0,
b1,
b2,
d1;
unsigned i;
double a,
b,
c,
d,
e;
*error = DATA_OK; // Assume that there are no problems
// Check for not enough data...
if (n < 3) {
*error = NOT_ENOUGH_DATA;
return 0;
}
a0 = 1;
a1 = 0;
a2 = 0;
a3 = 0;
a4 = 0;
b0 = 0;
b1 = 0;
b2 = 0;
for (i = 0; i < n; i++) {
double xi2 = xd[i] * xd[i];
double xi4 = xi2 * xi2;
double xy = xd[i] * yd[i];
a1 += xd[i];
a2 += xi2;
a3 += xi2 * xd[i];
a4 += xi4;
b0 += yd[i];
b1 += xy;
b2 += xy * xd[i];
}
a1 /= n;
a2 /= n;
a3 /= n;
a4 /= n;
b0 /= n;
b1 /= n;
b2 /= n;
d = a0 * (a2 * a4 - a3 * a3) - a1 * (a1 * a4 - a2 * a3) + a2 * (a1 * a3 - a2
* a2);
// Check for {near} singularity (all data is the same point)
if (fabs(d) < (DBL_EPSILON * 10.0)) {
*error = SINGLE_POINT;
return 0;
}
a = (b0 * (a2 * a4 - a3 * a3) + b1 * (a2 * a3 - a1 * a4) + b2 * (a1 * a3 -
a2 * a2)) / d;
b = (b0 * (a2 * a3 - a1 * a4) + b1 * (a0 * a4 - a2 * a2) + b2 * (a1 * a2 -
a0 * a3)) / d;
c = (b0 * (a1 * a3 - a2 * a2) + b1 * (a2 * a1 - a0 * a3) + b2 * (a0 * a2 -
a1 * a1)) / d;
// Check for {near} singularity (the data is perfectly linear)
if (fabs(c) < (DBL_EPSILON * 10.0)) {
*error = STRAIGHT_LINE;
return 0;
}
// Evaluation of standard deviation d
d = 0;
for (i = 0; i < n; i++) {
d1 = yd[i] - a - b * xd[i] - c * xd[i] * xd[i];
d += d1 * d1;
}
d = sqrt(d / (n - 3.0));
// Calculation of the minimum for x
e = -b / (c * 2.0);
// Load the constants into the return array
coefs[0] = a;
coefs[1] = b;
coefs[2] = c;
coefs[3] = d;
coefs[4] = e;
return e; // Return the x location for the minimum
}
void iterate_pers(void)
{
double x[40],
y[40];
double coefs[5];
int error;
double ans;
unsigned count;
int depth;
FILE *out;
init_personality();
for (depth = 4; depth <= 10; depth++) {
which_param = 0; /* change to zero to do them all... */
next:
count = 0;
ph = phunk_list[which_param];
if (which_param >= 4)
goto loop;
if (ph.p) {
which_delta = (ph.maxval - ph.minval) / 6;
if (which_delta == 0) {
which_param++;
goto next;
}
out = fopen("calibration.dat", "a");
if (out == NULL)
out = stderr;
ph.ploc[0] = ph.minval;
while (ph.maxval >= ph.ploc[0]) {
ph.ploc[0] += which_delta;
x[count] = ph.ploc[0];
y[count] = do_test(depth);
putchar('!');
count++;
}
ans = plsqf(x, y, count, coefs, &error);
if (error == 0) {
ans = ans + 0.5;
if (ans > ph.maxval)
ans = ph.maxval;
if (ans < ph.minval)
ans = ph.minval;
ph.ploc[0] = ans;
fprintf(out, "%s=%d at %d; stddev=%f\n", ph.p, ph.ploc[0],
depth, coefs[3]);
printf("\n%s=%d at %d; stddev=%f\n", ph.p, ph.ploc[0], depth,
coefs[3]);
fflush(out);
} else {
int i;
double miny,
minx;
printf("\nBad data this pass -- no fit. (%s): ", ph.p);
miny = y[0];
minx = x[0];
for (i = 0; i < count; i++) {
printf("(x=%f, y=%f), ", x[i], y[i]);
if (y[i] < miny) {
miny = y[i];
minx = x[i];
}
}
ph.ploc[0] = minx;
fprintf(out, "%s=%d at %d; {badfit}\n", ph.p, ph.ploc[0],
depth);
printf("\n%s=%d at %d; {badfit}\n", ph.p, ph.ploc[0], depth);
putchar('\n');
}
fclose(out);
which_param++;
goto next;
}
loop: ;
}
}
This goes into the parser:
if (!strcmp(first, "CALIBRATE")) {
noisy = 0;
iterate_pers();
noisy = 1;
return S_NOP;
}
The variable noisy turns off printing of output analysis if zero and on if 1.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.