Author: James T. Walker
Date: 08:26:19 01/01/05
Go up one level in this thread
On January 01, 2005 at 02:37:02, Kurt Utzinger wrote: >On December 31, 2004 at 15:03:40, James T. Walker wrote: > >>On December 31, 2004 at 12:18:36, Kurt Utzinger wrote: >> >>>Matches at 40’/40 + 40’/40 +40’ time control >>>Junior9-GUI, ponder=off, 3-/4-men EGTB >>>own books, no book learning, no learning >>>on 4 Athlons 1.3/64 MB hash for all engines >>>Details and games for download as usual at >>>http://www.utzingerk.com/jun9_test.htm >>>Mfg >>>Kurt >>> >>>(3) Junior 9 : 300 (+109,= 97,- 94), 52.5 % >>> >>>The King 3.23 T05 : 50 (+ 18,= 20,- 12), 56.0 % >>>Chess Tiger 15.0 : 50 (+ 20,= 17,- 13), 57.0 % >>>Fritz 8 : 50 (+ 22,= 15,- 13), 59.0 % >>>Hiarcs 9 : 50 (+ 13,= 17,- 20), 43.0 % >>>Shredder 8 : 50 (+ 11,= 18,- 21), 40.0 % >>>Gandalf 6.0 : 50 (+ 25,= 10,- 15), 60.0 % >> >>Hello Kurt, >>While I find your results interesting and others with similiar results with >>"Ponder off/no learning", I have to wonder if these test are worthwhile. The >>problem is that pondering is part of the program. If you are trying to test >>which is best at playing chess then cripling all programs is not necessarily >>cripling them equally. What if some programs are better at predicting others >>moves and therefore gain an advantage by pondering more accurately. > > No doubt: with ponder=off the engines may not play out the > same moves. On the other hand there is well known in the > meantime that results of matches ponder=on/off are about > equal, see at http://www.pittlik.de/winboard/ponder.html > and so I don't bother about it. > Kurt Well again I'm skeptical. I'm seeing strange results here that do not match me or SSDF and all are with ponder off on one machine. I can understand wanting to watch 2 programs play against each other. I personally find it facinating. People do post their settings/setup when posting results and that's great but most are not "normal" playing conditions which would be used in say a real tournament. I don't understand defending the results or setups as if it is some scientific test under official conditions which make them prove one program is better than the other. I have proven time and again that even under "default settings" and what I consider real tournament conditions I can manipulate the ratings of programs by choosing it's opponent and how many games are played. This is probably what is wrong with my/SSDF testing. As you say it is just more data to add to the pile but still it proves nothing except that most of the top programs are near each other. What disturbe me is that the last few years have shown no real improvement by the "latest" programs. They are comming out with a new GUI full of bugs and chess engines that are within a few points of it's predecessor. This is not "Advancement". I don't see any new inovations comming out. I only see slight changes in tuning which makes for a slightly different style but no real improvement in strength. > >The same >for learning/book learning. I'm getting suspicious that most of the >>improvements in new programs is just some "book-up" tricks against certain >>programs to gain quick Elo points. Disabling learning will allow these "tricks" to work continiously while book learning/learning will eventually nullify them. > > You are of course right. But our aim has never been to test > the goodness of book learning but rather to find out the "naked" > strength of a chess program. It's another way of thinking and > if it were up to me I would never use the opening books delivered > and only play with Nunn2, Noomen select positions and 5moves.ctg or > similar books for the same reason: I am not interested in testing > how good/worse an opening book is. And if doing so, tests can much > better be compared as the engines are free from positive/negative > influences of learning and book lines. > Kurt I'm not testing goodness of book learning but the goodness of the complete program as it comes out of the box and not hacked up. Why would you want a program "free from positive/negative influences of learning"? Learning is as natural as having sex. Everyone should do it. But some do it better than others. Also, don't you want your new programs to keep up with opening book theory? Testing programs in specific positions to "prove" which one is best at analyzing positions is also a "dream". For any given position nobody can predict in advance which program will give the best analysis. Each position is unique and has it's own special requirements to get the best answer. Some programs will meet those requirements to some degree in one positon and completely fail in the very next one. Sometimes you see a low rated program solve one problem faster than one of the top pros. Probably for the wrong reason. > >>I don't know if you've seen my blitz database ratings but it seems the longer I >>play them the closer they get in ratings. My ratings also closely immitate the >>SSDF list by showing only a few points increase between the Chess Tigers and >>Shredders. > > I have not seen your blitz database ratings but think that > your testing also contributes to get an overall impression > of the strength of the various chess programs. The more data > we have under different conditions the better. Personally, > I have never been interested in Elo and therefore abstained > from creating my own list of the many thousands games we > have played in the past few years. > Kurt > >Junior programs are showing up in the same fashion lately. I >>currently have Junior 9 trailing Junior 8 by 2 Elo points. > > This looks somewhat strange. In my opinion, Junior9 is > considerably stronger than Junior8 and a lot more reliable > in the analysis mode. > Kurt Of course you have seen different results under different circumstances so you are influenced by those results. This does not support your contention that test with "learning/pondering" off are "about equal" to mine/SSDF. I have seen many results posted here that are very strange compared to my results and when commpared to SSDF results. Those results are what make me question the value of hacking a program down to it's "basic" function of searching. >I'm getting >>suspicious that top programs are hitting a "wall" and showing no real >>improvement in strength, only a change in the way they play. > > Interesting comment but difficult to prove -:) > Kurt Really? Compare Chess Tiger 14/15/2004. Compare Fritz 7/8. Even Shredder 7/8 has a small improvement. And my results on Junior 8/9 show no real increase in strength although SSDF results will take some time. Hiarcs 7/8 showed nothing but Hiacs 9 did improve over them. Of course lower rated programs seem to have more room for improvement but if they are using similiar techniques of search/eval/pruning/time control then they will eventually hit the same wall. I think some new innovative technique is required at this point. Else all improvements of substantive value will only come from hardware. > >>Just food for thought. >>Regards, >>Jim
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.