Author: Miguel A. Ballicora
Date: 11:08:39 08/24/01
Go up one level in this thread
On August 24, 2001 at 12:36:18, Jeff Lischer wrote: >On August 24, 2001 at 10:34:01, José de Jesús García Ruvalcaba wrote: > >>On August 24, 2001 at 10:16:48, Uri Blass wrote: >> >>>On August 24, 2001 at 10:06:32, Miguel A. Ballicora wrote: >>> >>>>On August 24, 2001 at 07:51:16, Uri Blass wrote: >>>> >>>>>On August 24, 2001 at 07:29:21, Günther Simon wrote: >>>>> >>>>>>On August 24, 2001 at 07:15:30, Uri Blass wrote: >>>>>> >>>>>>>On August 24, 2001 at 07:06:51, Uri Blass wrote: >>>>>>> >>>>>>>>Here are the results by >>>>>>>>elostat program >>>>>>>> >>>>>>>>You can see that shredder is only 3th place micro based on the performance. >>>>>>>>Shredder is the world Micro champion by definition but Tiger and Rebel had a >>>>>>>>better performance. >>>>>>>> >>>>>>>> >>>>>>>>1 Deep Junior 7 : 2745 228 281 9 88.9 % 2384 22.2 % >>>>>>>>2 Quest (DeepFritz) : 2550 266 169 9 66.7 % 2430 44.4 % >>>>>>>>3 Chess Tiger 14.6 Gambit Tiger : 2499 291 229 9 55.6 % 2461 22.2 % >>>>>>>>4 Crafty 18.10X : 2467 291 165 9 55.6 % 2428 44.4 % >>>>>>>>5 Rebel : 2466 291 229 9 55.6 % 2428 22.2 % >>>>>>>>6 Shredder : 2466 266 249 9 66.7 % 2346 22.2 % >>>>>>>>7 Goliath : 2421 291 165 9 55.6 % 2382 44.4 % >>>>>>>>8 Gromit 3.9.5 : 2364 278 201 9 61.1 % 2285 33.3 % >>>>>>>>9 Ferret : 2359 291 229 9 55.6 % 2320 22.2 >>>>>>>>%10 Gandalf 5.0 : 2310 291 229 9 55.6 % 2271 22.2 >>>>>>>>% >>>>>>>>11 ParSOS : 2256 291 229 9 55.6 % 2217 22.2 % >>>>>>>>12 Diep : 2227 165 291 9 44.4 % 2265 44.4 % >>>>>>>>13 IsiChess X : 2166 201 278 9 38.9 % 2245 33.3 % >>>>>>>>14 Tao : 2165 229 291 9 44.4 % 2203 22.2 % >>>>>>>>15 Ruy Lopez : 2118 366 266 9 33.3 % 2238 0.0 % >>>>>>>>16 Pharaon : 2082 169 266 9 33.3 % 2202 44.4 % >>>>>>>>17 SpiderGirl : 2014 213 255 9 27.8 % 2180 33.3 % >>>>>>>>18 XiNiX : 1724 400 108 9 5.6 % 2216 11.1 % >>>>>>>> >>>>>>>>congratulation also for the Deep Junior team for winning the event convincingly >>>>>>>>when the difference from the second place is almost 200 elo and the hardware >>>>>>>>explain less than 70 elo difference. >>>>>>>> >>>>>>>>Uri >>>>>>> >>>>>>>I can add that I think that it may be a better idea to use elostat to decide >>>>>>>about the world champion in the future. >>>>>>> >>>>>>>I know that a lot of people are going to disagree but it is my opinion. >>>>>>>I prefer a complicated method that does more justive and not a simple method. >>>>>>> >>>>>>>Uri >>>>>> >>>>>> >>>>>>Sorry Uri - but this is really nonsens. >>>>>>You cant use ELO-Stat on a Swiss Tournament with 9 rounds as >>>>>>it is described by the author. ELO-Stat is designed to calculate >>>>>>ratings out of a pool of unknown rated progs with a very very lot >>>>>>of games. >>>>>>Therefor if you take a closer look at your table you would see that >>>>>>the error margin is at least 435!pts (Pharaon) and max 632!! (RuyLopez). >>>>>>And would you really believe Parallel SOS to be at 2256? :)) >>>>> >>>>>The question is not which program is better. >>>>>competitions of 9 rounds are not supposed to answer this question. >>>>> >>>>>The question is which program did better result. >>>>>The elostat answer this question better than the ranking >>>> >>>>You forget the tournament strategy. Many times, you can adjust the contempt >>>>because you know that a draw is extremely convenient or will give you the >>>>title right away. Not to mention the selection of more or less agressive opening >>>>books for a special round. Sometimes, a draw is the same as a loss and you risk. >>>>That throws away any significance of a performance ELO in a 9 round tournament. >>>>This also applies for any human tournament. >>>> >>>>You can also have the weird situation where you got 8.5/9 and the one with 8/9 >>>>has a better elo performance. They drew each other but a couple of opponents >>>>that play the 1st started to crash many games aftewards because of late minute >>>>changes in the code etc. That was totally out of control of the winner. >>> >>> >>>I think that it is not logical >>>If you get 8.5/9 your results are not worse than a player who got 8/9 and drew >>>against you. >>> >>>We look for a stable rating >>>Suppose that you got 8.5/9 >>>Suppose that the rating of the player you drew is better than your rating >>>I can prove that your rating is not stable and is going to get bigger after the >>>tournament. >>> >>>you do not lose rating from winning 8 games and the rating of the opponents is >>>not important. >>>you win rating from drawing one game against a player with better elo rating so >>>the total result is that you earn rating. >>> >>> >>>If the elostat let situation when 8/9 is better than 8.5/9 including a draw >>>between the 2 best players then something is wrong with the elostat program. >>> >>>Uri >> >>Hi Uri, >>plese try the following experiment with elostat. >>1. Players A, B, and C play each other, with the following individual results: >>A beats B 99.5 to 0.5 >>B beats C 99.5 to 0.5 >>A beats C 100 to 0 >>Which ratings do you get for A, B and C using Elostat? >> >>2. The same players, but with the following results: >>A beats B 99.5 to 0.5 >>B beats C 99.5 to 0.5 >>Same question as for part 1. >> >>If the program behaves correctly, the rating of A for part 1 should not be lower >>as the rating of A for part 2. >>José. > >Excellent question! Although one can't perform your experiment with ELOStat >directly (because it only reads in PGN files), I can run it with code I have >written simulating ELOStat. If I assume an average rating of 2000: > >ELOStat Results: > Case 1. A = 2920, B = 2000, C = 1080 > Case 2. A = 2694, B = 2000, C = 1306 > >This is a problem I've known about with ELOStat. The problem comes from ELOStat >using the "average opponent" approach, which isn't strictly accurate because of >the non-linearity of the Elo formula. (Example: If I am rated 2000 and I play >someone rated 2400, I should score about 9%. If I play 2 people one 2000 and the >other 2800, I should score about 25%.) > >I have written a modified code that uses a "sum over opponents" approach (the >idea was suggested to me by Walter Koroljow) to take care of this problem. >Rather than using an average opponent rating, this method sums over all a >player's opponents and calculates the expected rating of the player. With that >modified approach I get the following: Interesting, several years ago I wrote a program to maintain Volleyball Rankings (NCAA, University Championship in US) using what I think what you mention here. This year I saw that there was this program ELOstat and I thought that was doing something similar already. My program does not take as input a PGN, though, is difficult for Volleyball :-) I think I will run this case though but I will have to type the results :-( Regards, Miguel > >Modified Method Results: > Case 1. A = 2920, B = 2000, C = 1080 > Case 2. A = 2920, B = 2000, C = 1080 > > >Incidently, here's the WMCCC performance results I found using the modified >method using an average Elo of 2300: > >1. Junior 2829 >2. Fritz 2618 >3. Tiger 2551 >4. Shredder 2545 >5. Crafty 2514 >6. Rebel 2512 >7. Goliath 2461 >8. Ferret 2429 >9. Gromit 2424 >10. Gandalf 2323 >11. ParSOS 2260 >12. Diep 2251 >13. IsiChess 2104 >14. Tao 2082 >15. Ruy Lopez 1994 >16. Pharaon 1980 >17. SpiderGirl 1915 >18. XiniX 1612 > >Shredder is still behind Tiger (barely), but this time ahead of Crafty and >Rebel.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.