Author: Vasik Rajlich
Date: 14:46:13 01/30/06
Go up one level in this thread
On January 29, 2006 at 22:29:36, Albert Silver wrote: > >>>>>>I am 99.9% sure that ultrasolid is either better than solid in all of the Betas, >>>>>>or worse than solid in all of the Betas. >>>>> >>>>>Perhaps a 4th opponent is necessary. In my testing, UltraSolid scores a bit >>>>>better with Fritz 9, about the same against Hiarcs 10, and worse against Fruit >>>>>2.2. Worse enough that it kills whatever it gains from Fritz 9 and then some, so >>>>>that after the 150 games of the 3 matches it actually does a fraction worse. >>>>>This has happened twice though. Maybe I'll try one of its most difficult >>>>>opponents in my previous testing: Gambit Fruit 4bx, and see what happens. >>>>> >> >>This is tricky. I would be surprised (but not shocked) if such an effect >>existed. >> >>However, the thing to keep in mind is that by looking for such patterns, you >>effectively "dilute" your data. >> >>If you play enough games, and entertain enough hypotheses, then some of them >>will be true by accident. > >Well, look at it this way, I tested different settings, including the default, >and got this: > >The default settings against Fritz 9 and Hiarcs 10 (Hypermodern) scored: > >1 Rybka 1.01 Beta 12 32-bit 2900 +22/-11/=17 61.00% 30.5/50 >2 Hiarcs 10 2850 +11/-22/=17 39.00% 19.5/50 > >1 Rybka 1.01 Beta 12 32-bit 2900 +19/-15/=16 54.00% 27.0/50 >2 Fritz 9 2820 +15/-19/=16 46.00% 23.0/50 > >1 Rybka 1.01 Beta 12 32-bit 2900 +25/-10/=15 65.00% 32.5/50 >2 Fruit 2.2 2850 +10/-25/=15 35.00% 17.5/50 > >Total: 90 / 150 > >I then tested: > >Improving Position = Slightly Optimistic, >Deteriorating Position = Much More Pessimistic > >1 Rybka 1.01 Beta 12 32-bit 2900 +27/-7/=16 70.00% 35.0/50 >2 Hiarcs 10 2850 +7/-27/=16 30.00% 15.0/50 > >1 Rybka 1.01 Beta 12 32-bit 2900 +20/-15/=15 55.00% 27.5/50 >2 Fritz 9 2820 +15/-20/=15 45.00% 22.5/50 > >1 Rybka 1.01 Beta 12 32-bit 2900 +25/-11/=14 64.00% 32.0/50 >2 Fruit 2.2 2850 +11/-25/=14 36.00% 18.0/50 > >Total: 94.5 / 150 > >A better score but only in one match, even if by quite a bit. Is it a fluke, due >in part to the fast time control? > >I then tried the UltraSolid to the above, noting that I had tested it once >before and it had done better with Fritz 9, but worse (than default) with Fruit >2.2. > >Improving Position = Slightly Optimistic, >Deteriorating Position = Much More Pessimistic > > >1 Rybka 1.01 Beta 13 32-bit 2900 +26/-11/=13 65.00% 32.5/50 >2 Hiarcs 10 2850 +11/-26/=13 35.00% 17.5/50 > >1 Rybka 1.01 Beta 13 32-bit 2900 +26/-14/=10 62.00% 31.0/50 >2 Fritz 9 2820 +14/-26/=10 38.00% 19.0/50 > >1 Rybka 1.01 Beta 13 32-bit 2900 +25/-15/=10 60.00% 30.0/50 >2 Fruit 2.2 2850 +15/-25/=10 40.00% 20.0/50 > >Total: 93.5 / 100 > >As you can see, it is unclear whether UltraSolid is simply worse with Fruit 2.2, >but overall better, or whether it just isn't better. Since Toga and Fruit are so >close of kin, I tend to think that Hurd's results confirm the lack of >sutiability of UltraSolid with Fruit and Co. The question still remains as to >whether or not it is a somewhat isolated phenomenon. In fact, even my changes >only really appeared in one match, so they too bear further investigation. > Let me put it like this: that is the chance at this point (ie. just based on the games you list above, without any further testing) that your hypothesis is correct? If you analyze it "statistically", you might get some figure like (let's say) 30%. Just a total wild guess, eyeballing your numbers. In reality, though, it's probably more like 3%. The reason is that before your experiment started, there were let's say 20 candidate hypotheses, that you didn't even bother to list. Maybe ultrasolid is worse against Fritz, maybe it's better in closed positions, etc. One of these hypetheses is likely to get lucky, and this hypothesis will then of course have very nice data to support it. Anyway, there is nothing wrong with this procedure, as long as you eventually test your hypothesis "straight up". Vas >One thing is clear, and that is that testing against one opponent is risky, no >matter how strong. Even as strong as Rybka. > >>Of course, once you identify a particularly promising hypothesis, you can test >>it further and get an honest result. > >That is the whole idea of course! :-) > > Albert > >> >>Anyway, it's late so I probably am not making much sense :) >> >>Vas >> >>>>> >>>>>> >>>>Ok I'll do the same. I here what vasik says and he is the author. I bet you can >>>>sense a but coming on, have a look at this: >>>> >>>>http://www.talkchess.com/forums/1/message.html?481879 >>>> >>>>That was Beta12 now lets look at Beta13b. >>> >>>I'll be testing with Beta 13 and not 13b though. The reason is that I already >>>have the results of the default settings of Beta 12/13, as well as others. There >>>is no reason to presume that the parameter will work better with the hash change >>>he made compared to others. Otherwise I'd have to re-run the defaults settings >>>of 13b as well. >>> >>> Albert
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.