Author: Sune Fischer
Date: 17:38:04 07/15/05
Go up one level in this thread
On July 15, 2005 at 20:07:46, Dann Corbit wrote: >On July 15, 2005 at 19:27:56, Sune Fischer wrote: > >>On July 15, 2005 at 18:29:28, Dann Corbit wrote: >>> >>>The time difference is really about 4:1 >> >>Yes but they use faster machines. >> >>Anyway the point is simply that what we call quality today is what we will call >>crap in 3 years. Just like what we called quality 3 years ago is what we call >>crap today. >> >>Am I the only one who can see how rediculous that is? > >In 1905 you would have been happy with a car that had a top speed of 25 MPH. >Today you won't. To me, it is like saying, "There is no need to make that any >more beautiful because it is already beautiful enough." I fear you're missing my point. I want longer games too, but first and foremost I want enough games to make an accurate rating list, without that we have nothing worth while anyway. And if that can be achieved by playing 2 or 4 times faster, than that is what should be done. It seems folks insist on pushing the time control to such lengths that making accurate rating lists isn't possible. It is silly to sacrifice accuracy of rating for something so elusive as "quality". People will never be happy with the quality anyway so why chase this ghost. Let's make it accurate instead, we can do that at least. >No matter how beautiful it is, it can always become more beautiful. And I will >like it better when it becomes such. Eventually, I suppose, I will no longer be >able to appreciate the moves. Then it will be time to look at something else, I >suppose. > >>>>How many people actually go over these tons and tons of automated games anyway.. >>> >>>Not many. I also find the games between the best programs very useful for book >>>building. I would not trust the CEGT games for that purpose. >> >>I would prefer to use GM games, still. >>Needs another few years for "the quality" to be there ;) > >Mostly, the opening books for the computer programs already came from there. >And I think that at 40/2 with ponder on on 1 GHz, the computer will make less >mistakes than any GM. Yes, they will miss some brilliant positional moves. But >on average, they will make excellent choices. Actually, I think a mix of SSDF + >Correspondence + OTB makes the best books (as far as auto-generated books). The >"real" best books will be made by experts. > >>>>In order to construct a usable and interesting rating list priority number 1 is >>>>to have enough games for a reliable rating, otherwise it _is_ going to be >>>>statistical garbage. >>> >>>Controlling the environment of the test so that it is reproducible is probably >>>in the same range of importance as the large number of games. >> >>I guess saving the logs should be enough, who is going to reproduce a long >>tournament anyway? :) >> >>But actually I agree with you, which is why I _don't_ like that the SSDF use >>books and learning. Fixed start positions give full control every single time. >> >>>The AEGT and CEGT games do not seem to be held at a consistent time control. >> >>That's not so good obviously, but probably the price you have to pay when making >>a rating list in a big distributed manner. > >It may be that the different time controls are really what is wanted, if the >machines have been calibrated to some certain number of nodes or something. At >any rate, I think the CEGT stuff is very good data. That was going to be my defense, but then I noted some games had 0 increment and others not. >>>The AEGT and CEGT contests assume the NUNN positions as openings, and so they do >>>not exercise the opening book of the program being tested. That is fine to >>>measure engine strength, but it will not tell you about book+program and it will >>>not help you to prepare for that opponent (if it is a goal). >> >>People tend to make their own books and in tournaments the authors always use >>special (handcrafted) books, so in general it's a good idea to keep engine and >>book seperated when measuring. > >Depends on what you want to measure. If it is engine strength, then I agree >with you. If it is system strength, then I think the results will be wrong. Well that's true. :) >>>The older programs have more games against them and therefore are more accurate >>>as measuring tools. But a lot of people get hot under the collar about running >>>games on 450 MHz computers when they do that. >> >>It doesn't matter if you have stable engines. What you do is you run elostat on >>the whole database everytime, so ratings will automaticly rescale. >> >>At least it seems foolish to play with an old engine if the a newer version has >>been released. Remember you will still be playing with the old engine indirectly >>when you play against others engines that has played against it.. > >If I have ten games with a new engine of 3000 Elo and I have 10,000 games with >an old engine of 2300 Elo, the old engine will give me much better data by >playing against it than against the new one. The new engine will have huge >error bars in the confidence interval, and these must necessarily translate to >the engine for which the calibrated engine is used as a reference. I exaggerate >the numbers to make the meaning obvious, but the message is plain enough -- you >get better numbers from the measuring sticks with the finest graduations on >them. Here you simply end op with an ever growing list of engines you have to keep playing against, and because they play more and more games it will be harder and harder to take them out. How do you break this circle? Take out the old engines and just play on against its opponents, there must be many of those. >>>There is some inconsistent naming of the program names in CEGT and AEGT, >> >>Such as? > >AnMon 5.50 : 2556 81 114 38 47.4 % 2575 36.8 % >AnMon5.50 : 2543 17 18 1088 41.2 % 2605 31.6 % > >GLC 3.01.2.2 : 2490 41 30 265 52.5 % 2473 35.5 % >GLChess 3.01.2.2 : 2561 128 219 13 46.2 % 2588 46.2 % >GLChess 3.0122 : 2534 86 114 38 47.4 % 2552 31.6 % >Green Light Chess 3.01.2.2: 2561 24 32 423 46.0 % 2589 37.6 % >Green Light Chess 301.2.2 : 2547 126 126 22 50.0 % 2547 36.4 % >GreenLightChess 30122 : 2547 210 210 11 50.0 % 2547 27.3 % > >Knight Dreamer 3.3 : 2372 102 73 51 24.5 % 2568 29.4 % >KnightDreamer 3.3 : 2367 31 23 497 50.8 % 2361 30.0 % > >Naum 1.8-b1 : 2609 110 65 38 55.3 % 2572 52.6 % >Naum 1.8b1 : 2578 95 70 51 53.9 % 2551 37.3 % > >Pepito 1.59 : 2574 93 93 38 50.0 % 2574 36.8 % >Pepito v1.59 : 2434 32 27 417 52.9 % 2414 28.1 % > >Ruffian 2.1.0 : 2644 16 12 1715 52.3 % 2628 34.2 % >Ruffian 2.10 : 2573 147 453 5 30.0 % 2720 60.0 % > >There are several others I am not sure of like this one: >Shredder 9 : 2756 10 14 2666 69.1 % 2616 29.4 % >Shredder 9 UCI : 2713 128 104 26 61.5 % 2631 38.5 % Hmm annoying, should be possible one would think. -S
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.