Author: Roger D Davis
Date: 10:01:16 10/10/04
Go up one level in this thread
On October 10, 2004 at 04:52:32, GuyHaworth wrote: > >See http://www.talkchess.com/forums/1/message.html?390944 > >Kurt Utzinger has calculated the 'tournament performance ratings', i.e. most >likely ELOs. According to him: > > FRITZ, rated at 2700, comes out +308 at 3008 > HYDRA, rated at 2700, comes out +341 at 3041 > even though it comes 2nd on a countback of opponents' game scores here > Topalov, FIDE ELO 2757, comes out -149 at 2608 > DEEP JUNIOR, rated at 2700, comes out -092 at 2608 > Ponomariov, FIDE ELO 2710, comes out -209 at 2501 > Karjakin, FIDE ELO 2576, comes out -075 at 2501 > >A player's TPR is not affected by the ELO they go in at. In fact, the engines' >TPRs are 'FIDE ELOs' as all their opponents have FIDE ELOs, whereas the humans' >TPRs are 'SSDF ELOs', not to be compared with their original FIDE ELOs. > >Ponomariov and Karjakin both scored 1, and both TPR at 2501 because all their >opponents were rated at 2700 before the tournament. > >Topalov and DEEP JUNIOR both scored 1.5, and both TPR at 2608 though the reason >is more interesting. The average ELO of JEEP JUNIOR's opponents was 2700, >obviously a coincidence. > >The machines have an average TPR of 2853. > > >But these TPRs are only the 'most likely' TPRs and I don't have a way of saying >how likely. SSDF give a +- band for its 'SSDF ELO' ratings, so, e.g., > > SHREDDER 8.0 CB is 2818 (+34, -32) with 70% from 481 games > ... against opponents averaging an 'SSDF ELO' rating of 2673 > > SHREDDER 7.04 UCI is 2809 (+24, -23) with 71% from 967 games > ... against opponents averaging an 'SSDF ELO' rating of 2648 > > >The interval is defined so that the actual 'SSDF ELO' of the engines has a >probability of 0.95, or is 95% likely, to fall in the given band. It would be >good if FIDE would do the same. Confidence limits for lower confidence levels >can be calculated by standard maths. > >Note that 2x the games (as above) divides the width of the band by >~square_root(2). It would need 4x the games to divide the width of the band by >~2. > >In this tournament we have 4 games rather than ~512, 2^7 times more. > >So - not at all rigorously - I would expect the 95%-confidence-interval band for >the TPR to be 11.314x wider than +-33, i.e. 373. > >i.e. One might say that: > > FRITZ is 95% likely to have a 'FIDE ELO' of 3008 +- 373, (2635, 3381) > HYDRA is 95% likely to have a 'FIDE ELO' of 3041 +- 373, (2668, 3414) > DJ is 95% likely to have a 'FIDE ELO' of 2608 +- 373, (2235, 2981) > >The TPRs are much less significant than might at first seem. > >Maybe if someone has the proper ELO-calculating software, these 95% confidence >intervals can be superceded by the real ones. > >I'm interested in the likelihood that the engines have a 'FIDE ELO' >= 2700. I >think we can say, that on the evidence, and against carbon rather than silicon >competition: > > HYDRA and FRITZ are certainly 90% likely to be over 2700 > DEEP JUNIOR is [clearly] over 50% likely to be less than 2700 > > >These numbers change a lot on a half-point just missed or just gained. Machines >don't blunder by missing a tactic in the same way as humans, and maybe the >humans did so in this tournament: I don't know. > >To see how well the machines were prepared to play the opponent as well as the >game, at least in the opening, one has to look at how they seem to emerge at say >move 15. > >I haven't studied the games to see precisly what happened. DEEP JUNIOR had the >only machine loss, to Karjakin, so I'd ask why that was. Amir/Shay work v hard >to prepare for specific opponents, so their showing here with DEEP JUNIOR is a >surprise to me. > >g Thanks for the very detailed answer...much appreciated. :) Roger
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.