Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Fritz Rating on tournament

Author: Stephen A. Boak

Date: 19:08:26 05/19/00

Go up one level in this thread


On May 19, 2000 at 12:02:18, Dann Corbit wrote:

>On May 19, 2000 at 11:57:19, Chris Carson wrote:
>
>>On May 19, 2000 at 10:48:44, Osorio Meirelles wrote:
>>
>>>
>>>  What is Fritz rating on the 9 tournament games it actually "played"
>>>  an opponent" ( not counting the 4 mover game and the 0 mover )?
>>
>>Prog        HW    TPR  Opp  W D L Tot
>>Pritz SSS   4x500 2592 2548 3 4 2  9
>>
>>with one other game from last years WCCC:
>>
>>Fritz 6/SSS 4x500 2635 2555 4 4 2  10
>
>You can establish an ELO but not a TPR by combining events.

Picky, picky, Dan.  :)  These physicists and mathematicians...reminds me about
the joke about the 3 visitors to Scotland who saw the black sheep--the
philosopher, the engineer and the physicist.  BTW, that joke wasn't posted by
*you*, was it?

1. You can establish an Average TPR (ATPR hehe!) the way Chris combined two
events.  That's quite acceptable to me, and very understandable, whether the
acronym is altered or not--when the underlying data from which the calculation
is made is shown and is clear, as Chris aptly provided.

   In my own 'TPR' or ELO-based calculations (USCF or FIDE human games, or
comp-comp, etc), I calculate GPR (Game Performance Rating), game by game, by the
use of the +/- 400 rule embodied in a simple Excel formula.  It is then easy on
a spreadsheet to tally the Average GPR for an event to get an event TPR; or to
tally the Average of GPRs from many events to obtain an ATPR over a particular
time frame.

   I especially like (for my personal USCF rating history) to calculate rolling
AGPRs for the last xx number of games, as well as White AGPRs f and Black AGPR
for the last yy White games and yy Black games.  I do this kind of stuff for my
overall, White, and Black AGPR for the last nn games against opponents with
ratings under 1800 USCF, ratings 1800-1999 USCF, and ratings 2000-2xxx USCF.
The breakdowns give me interesting data about my relative weakness or strength
with each color, my performance trends (over any arbitrary number of games).
With enough games, I can even do this for particular openings I have played over
the course of several months or years.

   Of course a single GPR is meaningless (although good data) for establishing a
rating that is reasonably, ahem, well...established, i.e. that has a known
Standard Deviation.

2. You can't establish a 'good' moving (relatively current) ELO by always
combining events, no matter how long ago they took place.  A moving ELO is a
critical aspect and major idea in the underpinnings of the ELO system--to
establish a measure of current ability/performance that fluctuates up and down
to some degree according to the natural variation in both measurement processes
and human performance.

   When programs are undergoing developmental changes, nearly constantly, and
bugs then fixes are constantly introduced and patched, and hardware platforms
are upgraded similarly over time, in many cases the recent hardware/software
combination is not the same as that of several months back.  In such cases I
find it hard to swallow (as meaningful) the AGPR of a company's software that
has been altered and played on many different platforms over a couple of years
or more.

   In such cases, more interesting to me would be

   A. A graph of a rolling AGPR, calculated using the last xx number of games,
where xx is not so large that ancient versions on ancient hardware are averaged
with very current versions and very current hardware.  Doing this with Chris's
data for any one program (Rebel, Fritz, Junior, Shredder) with many comp-human
games would be very interesting, after enough games have occurred.

   This would help isolate current strength and strength trends--as the
hardware/software version combinations have improved, in general, over time.  It
could also be done for a group of strong programs, to test the notion that the
'top programs' are or are not at or near GM strength (today, now, recently--not
using results from 3 years ago, for example).

   B. A continuing ELO rating history (and graph, of course!), calculated event
by event, at the end of each discrete event.  This would be the most similar to
how the USCF (for example, I believe) calculates changing USCF ratings for
regular tournament players.  I think doing this with Chris's data would be
interesting to see what ELO rating the top programs (or a program) have
*achieved*, after several comp-human tournaments.  This method by its nature
(mathematical nature of the ELO system) would give less and less weight to the
older and older games as the program plays more and more humans over time.

   Just some thoughts and ideas.

   --Steve











This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.