Author: Jeff Lischer
Date: 09:36:18 08/24/01
Go up one level in this thread
On August 24, 2001 at 10:34:01, José de Jesús García Ruvalcaba wrote: >On August 24, 2001 at 10:16:48, Uri Blass wrote: > >>On August 24, 2001 at 10:06:32, Miguel A. Ballicora wrote: >> >>>On August 24, 2001 at 07:51:16, Uri Blass wrote: >>> >>>>On August 24, 2001 at 07:29:21, Günther Simon wrote: >>>> >>>>>On August 24, 2001 at 07:15:30, Uri Blass wrote: >>>>> >>>>>>On August 24, 2001 at 07:06:51, Uri Blass wrote: >>>>>> >>>>>>>Here are the results by >>>>>>>elostat program >>>>>>> >>>>>>>You can see that shredder is only 3th place micro based on the performance. >>>>>>>Shredder is the world Micro champion by definition but Tiger and Rebel had a >>>>>>>better performance. >>>>>>> >>>>>>> >>>>>>>1 Deep Junior 7 : 2745 228 281 9 88.9 % 2384 22.2 % >>>>>>>2 Quest (DeepFritz) : 2550 266 169 9 66.7 % 2430 44.4 % >>>>>>>3 Chess Tiger 14.6 Gambit Tiger : 2499 291 229 9 55.6 % 2461 22.2 % >>>>>>>4 Crafty 18.10X : 2467 291 165 9 55.6 % 2428 44.4 % >>>>>>>5 Rebel : 2466 291 229 9 55.6 % 2428 22.2 % >>>>>>>6 Shredder : 2466 266 249 9 66.7 % 2346 22.2 % >>>>>>>7 Goliath : 2421 291 165 9 55.6 % 2382 44.4 % >>>>>>>8 Gromit 3.9.5 : 2364 278 201 9 61.1 % 2285 33.3 % >>>>>>>9 Ferret : 2359 291 229 9 55.6 % 2320 22.2 >>>>>>>%10 Gandalf 5.0 : 2310 291 229 9 55.6 % 2271 22.2 >>>>>>>% >>>>>>>11 ParSOS : 2256 291 229 9 55.6 % 2217 22.2 % >>>>>>>12 Diep : 2227 165 291 9 44.4 % 2265 44.4 % >>>>>>>13 IsiChess X : 2166 201 278 9 38.9 % 2245 33.3 % >>>>>>>14 Tao : 2165 229 291 9 44.4 % 2203 22.2 % >>>>>>>15 Ruy Lopez : 2118 366 266 9 33.3 % 2238 0.0 % >>>>>>>16 Pharaon : 2082 169 266 9 33.3 % 2202 44.4 % >>>>>>>17 SpiderGirl : 2014 213 255 9 27.8 % 2180 33.3 % >>>>>>>18 XiNiX : 1724 400 108 9 5.6 % 2216 11.1 % >>>>>>> >>>>>>>congratulation also for the Deep Junior team for winning the event convincingly >>>>>>>when the difference from the second place is almost 200 elo and the hardware >>>>>>>explain less than 70 elo difference. >>>>>>> >>>>>>>Uri >>>>>> >>>>>>I can add that I think that it may be a better idea to use elostat to decide >>>>>>about the world champion in the future. >>>>>> >>>>>>I know that a lot of people are going to disagree but it is my opinion. >>>>>>I prefer a complicated method that does more justive and not a simple method. >>>>>> >>>>>>Uri >>>>> >>>>> >>>>>Sorry Uri - but this is really nonsens. >>>>>You cant use ELO-Stat on a Swiss Tournament with 9 rounds as >>>>>it is described by the author. ELO-Stat is designed to calculate >>>>>ratings out of a pool of unknown rated progs with a very very lot >>>>>of games. >>>>>Therefor if you take a closer look at your table you would see that >>>>>the error margin is at least 435!pts (Pharaon) and max 632!! (RuyLopez). >>>>>And would you really believe Parallel SOS to be at 2256? :)) >>>> >>>>The question is not which program is better. >>>>competitions of 9 rounds are not supposed to answer this question. >>>> >>>>The question is which program did better result. >>>>The elostat answer this question better than the ranking >>> >>>You forget the tournament strategy. Many times, you can adjust the contempt >>>because you know that a draw is extremely convenient or will give you the >>>title right away. Not to mention the selection of more or less agressive opening >>>books for a special round. Sometimes, a draw is the same as a loss and you risk. >>>That throws away any significance of a performance ELO in a 9 round tournament. >>>This also applies for any human tournament. >>> >>>You can also have the weird situation where you got 8.5/9 and the one with 8/9 >>>has a better elo performance. They drew each other but a couple of opponents >>>that play the 1st started to crash many games aftewards because of late minute >>>changes in the code etc. That was totally out of control of the winner. >> >> >>I think that it is not logical >>If you get 8.5/9 your results are not worse than a player who got 8/9 and drew >>against you. >> >>We look for a stable rating >>Suppose that you got 8.5/9 >>Suppose that the rating of the player you drew is better than your rating >>I can prove that your rating is not stable and is going to get bigger after the >>tournament. >> >>you do not lose rating from winning 8 games and the rating of the opponents is >>not important. >>you win rating from drawing one game against a player with better elo rating so >>the total result is that you earn rating. >> >> >>If the elostat let situation when 8/9 is better than 8.5/9 including a draw >>between the 2 best players then something is wrong with the elostat program. >> >>Uri > >Hi Uri, >plese try the following experiment with elostat. >1. Players A, B, and C play each other, with the following individual results: >A beats B 99.5 to 0.5 >B beats C 99.5 to 0.5 >A beats C 100 to 0 >Which ratings do you get for A, B and C using Elostat? > >2. The same players, but with the following results: >A beats B 99.5 to 0.5 >B beats C 99.5 to 0.5 >Same question as for part 1. > >If the program behaves correctly, the rating of A for part 1 should not be lower >as the rating of A for part 2. >José. Excellent question! Although one can't perform your experiment with ELOStat directly (because it only reads in PGN files), I can run it with code I have written simulating ELOStat. If I assume an average rating of 2000: ELOStat Results: Case 1. A = 2920, B = 2000, C = 1080 Case 2. A = 2694, B = 2000, C = 1306 This is a problem I've known about with ELOStat. The problem comes from ELOStat using the "average opponent" approach, which isn't strictly accurate because of the non-linearity of the Elo formula. (Example: If I am rated 2000 and I play someone rated 2400, I should score about 9%. If I play 2 people one 2000 and the other 2800, I should score about 25%.) I have written a modified code that uses a "sum over opponents" approach (the idea was suggested to me by Walter Koroljow) to take care of this problem. Rather than using an average opponent rating, this method sums over all a player's opponents and calculates the expected rating of the player. With that modified approach I get the following: Modified Method Results: Case 1. A = 2920, B = 2000, C = 1080 Case 2. A = 2920, B = 2000, C = 1080 Incidently, here's the WMCCC performance results I found using the modified method using an average Elo of 2300: 1. Junior 2829 2. Fritz 2618 3. Tiger 2551 4. Shredder 2545 5. Crafty 2514 6. Rebel 2512 7. Goliath 2461 8. Ferret 2429 9. Gromit 2424 10. Gandalf 2323 11. ParSOS 2260 12. Diep 2251 13. IsiChess 2104 14. Tao 2082 15. Ruy Lopez 1994 16. Pharaon 1980 17. SpiderGirl 1915 18. XiniX 1612 Shredder is still behind Tiger (barely), but this time ahead of Crafty and Rebel.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.