Author: James T. Walker
Date: 14:28:32 01/17/06
Go up one level in this thread
On January 17, 2006 at 16:47:53, Heinz van Kempen wrote: >On January 17, 2006 at 15:18:00, Dann Corbit wrote: > >>On January 17, 2006 at 15:11:24, Joseph Ciarrochi wrote: >> >>>I find it difficult to empircally demonstrate that programs differ in their >>>relative ranking, depending on time (not including time controls below 4 minutes >>>mind you, where wierd stuff does happen). Ponder is kind of like increasing time >>>to think. >>> >>> For example, the cegt blitz ratings correlate .99 with the CEGT longer time >>>controls. >>> >>>I would love to see evidence of changes in rank due to differences is ponder or >>>time. >>> >>>maybe hiarcs 10 hyper modern is a candidate for being better at blitz. e.g., >>>check out its standard rating >>>(http://www.husvankempen.de/nunn/cegtrating4040all.html) >>> >>> versus its blitz rating (http://www.husvankempen.de/nunn/eloblitzall.html) >>> >>>Fruit, in contrast, may be a little weaker in blitz. >> >>Ktulu 75 seems to prosper at 40/120 in my tests. >>Glaurung 1.01 seems to prosper at blitz in my tests. > >Hi Dann, Joseph and all, > >there seem to be exceptions from this comparisons. Many testers claim that >especially Gandalf and Junior will gain from more time and I am just seeing that >Zappa and Crafty Cito profit considerably from more time (or two CPU´s) and >there are good Blitz engines, too. > >About ponder on tournaments: We could run them in CEGT, but tester´s opinion so >far is that this would be waste of CPU time (because there is only a certain >percentage of ponder hits) and so it would be more or less the same to give more >time for all instead. It's this "more or less" that may be the difference. You are basically assuming that all engines ponder equally and I don't believe they do. The main reason for my belief is that my Elo rankings differ from CEGT slightly. Since I started my blitz database all my games were auto232 with native books and of course ponder on. Since I purchased an AMD 64x2 I have been running eng/eng matches with ponder on and native books with the exception of Rybka of course since it has no book yet. I have been keeping Rybka games separate from my original database untill it is complete with book and endgame TBs. In both the "Rybka" database and my original 30,000 game database Toga II is rated lower than Fruit 2.2.1. I have a hard time understanding opinions here that Toga is stronger than Fruit. I assume it's either because I'm testing at blitz only or because I do "ponder on" testing (or both). I can't get past that paradigm because of my own experience with my testing. Jim > >Currently I do not think it would be a good idea to adapt the benchmark to >faster hardware. The reasons are that this would be demanded too much from those >testers using slower machines and it would also give fewer games for each >engine. Results would not have the same statistical value and less amateur >engines would be tested. In my opinion CEGT should not only be high end testing >for ChessBase and a few more currently storming to the top. Percentage of those >games is already high enough. When opinions from other testers differ here we >can still discuss this and just see what we can do at this point. > >Best Regards >Heinz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.