Author: Stephen A. Boak
Date: 19:08:26 05/19/00
Go up one level in this thread
On May 19, 2000 at 12:02:18, Dann Corbit wrote: >On May 19, 2000 at 11:57:19, Chris Carson wrote: > >>On May 19, 2000 at 10:48:44, Osorio Meirelles wrote: >> >>> >>> What is Fritz rating on the 9 tournament games it actually "played" >>> an opponent" ( not counting the 4 mover game and the 0 mover )? >> >>Prog HW TPR Opp W D L Tot >>Pritz SSS 4x500 2592 2548 3 4 2 9 >> >>with one other game from last years WCCC: >> >>Fritz 6/SSS 4x500 2635 2555 4 4 2 10 > >You can establish an ELO but not a TPR by combining events. Picky, picky, Dan. :) These physicists and mathematicians...reminds me about the joke about the 3 visitors to Scotland who saw the black sheep--the philosopher, the engineer and the physicist. BTW, that joke wasn't posted by *you*, was it? 1. You can establish an Average TPR (ATPR hehe!) the way Chris combined two events. That's quite acceptable to me, and very understandable, whether the acronym is altered or not--when the underlying data from which the calculation is made is shown and is clear, as Chris aptly provided. In my own 'TPR' or ELO-based calculations (USCF or FIDE human games, or comp-comp, etc), I calculate GPR (Game Performance Rating), game by game, by the use of the +/- 400 rule embodied in a simple Excel formula. It is then easy on a spreadsheet to tally the Average GPR for an event to get an event TPR; or to tally the Average of GPRs from many events to obtain an ATPR over a particular time frame. I especially like (for my personal USCF rating history) to calculate rolling AGPRs for the last xx number of games, as well as White AGPRs f and Black AGPR for the last yy White games and yy Black games. I do this kind of stuff for my overall, White, and Black AGPR for the last nn games against opponents with ratings under 1800 USCF, ratings 1800-1999 USCF, and ratings 2000-2xxx USCF. The breakdowns give me interesting data about my relative weakness or strength with each color, my performance trends (over any arbitrary number of games). With enough games, I can even do this for particular openings I have played over the course of several months or years. Of course a single GPR is meaningless (although good data) for establishing a rating that is reasonably, ahem, well...established, i.e. that has a known Standard Deviation. 2. You can't establish a 'good' moving (relatively current) ELO by always combining events, no matter how long ago they took place. A moving ELO is a critical aspect and major idea in the underpinnings of the ELO system--to establish a measure of current ability/performance that fluctuates up and down to some degree according to the natural variation in both measurement processes and human performance. When programs are undergoing developmental changes, nearly constantly, and bugs then fixes are constantly introduced and patched, and hardware platforms are upgraded similarly over time, in many cases the recent hardware/software combination is not the same as that of several months back. In such cases I find it hard to swallow (as meaningful) the AGPR of a company's software that has been altered and played on many different platforms over a couple of years or more. In such cases, more interesting to me would be A. A graph of a rolling AGPR, calculated using the last xx number of games, where xx is not so large that ancient versions on ancient hardware are averaged with very current versions and very current hardware. Doing this with Chris's data for any one program (Rebel, Fritz, Junior, Shredder) with many comp-human games would be very interesting, after enough games have occurred. This would help isolate current strength and strength trends--as the hardware/software version combinations have improved, in general, over time. It could also be done for a group of strong programs, to test the notion that the 'top programs' are or are not at or near GM strength (today, now, recently--not using results from 3 years ago, for example). B. A continuing ELO rating history (and graph, of course!), calculated event by event, at the end of each discrete event. This would be the most similar to how the USCF (for example, I believe) calculates changing USCF ratings for regular tournament players. I think doing this with Chris's data would be interesting to see what ELO rating the top programs (or a program) have *achieved*, after several comp-human tournaments. This method by its nature (mathematical nature of the ELO system) would give less and less weight to the older and older games as the program plays more and more humans over time. Just some thoughts and ideas. --Steve
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.