Author: Mark Young
Date: 15:39:00 08/29/99
Go up one level in this thread
On August 29, 1999 at 17:12:45, Robert Hyatt wrote: >On August 29, 1999 at 16:51:02, Mark Young wrote: > >>On August 29, 1999 at 15:36:50, Robert Hyatt wrote: >> >>>On August 29, 1999 at 15:04:09, Frank Quisinsky wrote: >>> >>>>Hello Robert, >>>> >>>>>>example ... >>>> >>>>>>Crafty thinking for move 28 in the game >>>>>>02:58 13/02 move Ka1 without ponder >>>>>>02:20 13/04 move Ka1 with ponder >>>> >>>>>that makes no sense. pondering saved 38 seconds? It should save more like >>>>>2 minutes there. >>>> >>>>An bad example from me, but I mean that when Crafty 2 minutes more time Crafty >>>>found in 30% ponder hints not more then 5 avoidable better moves. And this 5 >>>>moves which play Crafty without ponder must not been bad ! >>>> >>>>And I will say that this is not for an statistic relevant. Bob you can see the >>>>rating list from Kai, Christian and me of the new WinBoard site. Crafty play >>>>with 2494 ELO and Comet play with 2445 ELO (over 500 games). >>>> >>>>And when I make an rating list on two PCs I think that Crafty play with ~ 2500 >>>>ELO and Comet with ~ 2450 ELO + 20-40 for ponder ! >>>> >>>>And when Comet the time control better use then Crafty play Comet with 2440 ElO >>>>and Crafty with 2500 ELO on one PC ! Or will you say that Crafty play more than >>>>50 ELO better then Comet on one PC or better than 80 ELO by AnMon, looked in the >>>>ratinglist from Kai, Christian and me ? >>>> >>> >>>You can believe what you want, and play matches any way you want. I simply >>>told you that the way you are playing them is non-optimal. Ed said the same >>>thing. If you think you know my program better than I do, that's fine. I >>>simply say that if you play crafty with ponder=off, you hurt it in ways you >>>do _not_ understand. Some other programs may be hurt in the same way. Some >>>may not. When you mix a program that is hurt by this with one that is not, >>>the results get skewed. >>> >>>It _does_ affect Crafty. That I an _certain_ of. Other programs I have no >>>idea about, other than Ed said it hurts Rebel as well... >>> >>> >>> >>> >>>>>>In move 29 in this game >>>>>>04:45 11/04 move Ka2 without ponder >>>>>>05:38 11/05 move Ka2 with ponder >>>>> >>>>>ditto... it depends on how long the opponent thinks _after_ crafty >>>>>starts pondering... If it thinks for the normal amount of time, crafty >>>>>gets that much think-time _free_. And I've _never_ seen the prediction >>>>>rate below 50% against a computer, more commonly it is well above 50%. >>>>>The log file will show how many moves it correctly predicted, which will >>>>>tell how many times it could potentially save time. >>>>> >>>>>But you are totally missing the point Ed raised and I seconded: if one >>>>>program has been tested and tuned for ponder=off play, and the other has >>>>>not, then that program has a significant advantage. Tough luck, you say? >>>>>Of course... but then your results don't have anything to do with how the >>>>>two programs would perform on separate machines. >>>> >>>>Yes I see that problem Robert. And I must say this is all correct what you >>>>writing ! >>>> >>>>But you think ponder make 50-100 and the time control for matches on one machine >>>>is bad (I mean, you are the programmer and you can this say) but I think ponder >>>>is 20-40 ELO and I see not time problems in Crafty when I looked this matches >>>>with longer time control. The engine which had an better time control for >>>>matches on one PC had an minmal advantage, I think 10 ELO. This advantage is not >>>>relevant. >>>> >>>>>That is why we keep saying "don't run games on one computer... the results >>>>>are not always as meaningful as you might assume..." >>>> >>>>And I say play matches on one Computer than the results are for a statistic very >>>>good. And I am happy when user play tournament with Winboard and send me this >>>>data for the homepage from volker and me :-)) >>>> >>>>>you are missing the point. my time allocation _depends_ on saving time by >>>>>pondering. You are not allowing it to do that. Which is the problem with >>>>>this... nobody would argue that _all_ engines are 50-100 elo stronger with >>>>>ponder=on than they are with ponder=off. That is easily testable on a chess >>>>>server. But the issue here is whether a program is tested with ponder=off or >>>>>not. Mine isn't. Ed's isn't. >>>> >>>>No I see this point ! >>>>And I will not say no when the programmer say yes. I will not so discussion. But >>>>Robert in this point I see not 50-100 ELO, when Crafty play with an good time >>>>control under WinBoard. >>>> >>>>And another point is all engines, yes ! >>>> >>>>OK what can an programmer make with ponder. Ponder is ponder. Programm A found >>>>the best moves in 10 seconds and play this moves in 3 minutes and programm B >>>>found the move in 3 minutes and play this move with ponder. Then had programm B >>>>an advantage ! And another advantage for ponder, learning ? >>>> >>> >>> >>> >>>You are _still_ overlooking the point. When crafty ponders, it builds up a >>>time 'surplus'. It can use this in creative ways, to either search longer >>>when the position is unclear, or when the eval drops. If it doesn't have this >>>'surplus' then it doesn't do these things in the same way. And with no >>>pondering, it won't ever have a surplus. Other assumptions made in the time >>>allocation are also incorrect with no pondering... >>> >>>So it isn't _just_ finding a better move when it ponders correctly that is the >>>issue here.. It is the _time saved_ on such moves that then influences _other_ >>>moves in the game... those you are ignoring.. >>> >>>>And Server ... >>>>This is right, on Server the most games are blitz games. And here is ponder at >>>>the moment importent. >>>> >>>>>generally 2x faster is 70 Elo better. Pondering has the potential to make >>>>>a program act like it is twice as fast... >>>> >>>>Is this gereally 2xfaster 70 ELO better ? >>>> >>>>In the last years I think ! >>>> >>>>You say with this statement ... >>>> >>>>AMD K6-3 450 2500 ELO >>>>AMD K6-3 900 2570 ELO >>>>AMD K6-3 1800 2640 ELO >>>>AMD K6-3 3600 2710 ELO >>>> >>>>I think when Crafty on an AMD K6-3 450 play with 2500 ELO and come in Ply 13 >>>>(tournament play) the AMD K6-3 with 3600 come not in play 18 for 2700 ELO !!!! >>>> >>> >>>your math is bad. going from 450 to 3600 gets at most 2 plies. It takes a >>>factor of 3x roughly to get another ply. 10x faster is roughly two plies >>>deeper. >>> >>>And the 70 Elo works.. because the "Elo" we are talking about is _not_ >>>the performance against humans, it is the performance between two identical >>>programs but one running 2x faster. And that 2x faster program will win a >>>bunch more games, yet against humans the difference won't be nearly as >>>dramatic... >>> >>> >>> >>>>The AMD K6-3 with 3600 MHz come Crafty in Ply 15 and play with 2625 ELO ! >>>> >>>>>But suppose you take his car, and suddenly make him run with rain tires when he >>>>>hasn't in the past. How do you think he'd do then? No testing? He'd be pretty >>>>>unlikely to even finish the race. This is a common NASCAR problem in the USA. >>>>>There are many good rain tires, and some NASCAR races are on wet tracks. But >>>>>the drivers don't use the rain tires because to quote one this week "It would >>>>>be on-the-job-training, because we can't have rain when we need it to test..." >>>>> >>>>>That is the point with chess. You are testing the programs in a mode where _we_ >>>>>don't test them. Poor performance is not unexpected... >>>> >>>>Yes this is an good example :-)) >>>> >>>>OK Bob, I play with many chess programs and I have play with two computers and >>>>with one computer. My ELO is not so big than I can say it is 20-40 ELO, but I >>>>can see that the programs with ponder not play more than 5 another moves in the >>>>games. And this 5 moves which the engines play without ponder are not bad. So I >>>>will say that this is not importent for an statistic. >>>> >>>>Kind regards >>>>Frank >>> >>> >>>Just note that I pointed out that you are looking _only_ at the moves that >>>were pondered correctly. The time saved affects _every other move_ in the >>>game in different ways. If you play thru the whole game with 2x the time per >>>move, you will find many places where it would have changed its mind if it had >>>had a little more time, which it would have had had pondering been enabled... >> >>I will not argue that not pondering changes a programs move selection. That is >>only logical. What is uncertain is will the change in a few moves changes the >>outcome of the games in a one computer engine vs engine test. The data I >>generated says no, the other data I have seen says no. I can only conclude at >>this time the change is not a much as you imagine for what ever reason that may >>be. And for sure that change is well below 50 elo points. >> >>Q: If the change in results is 50 to 100 elo points why are we not seeing this >>change in our results between the one-computer test and the tests run on two >>computers? >> >>You do not need hundreds of games to see a change that big. > >Because you are playing _both_ programs with ponder=off. _Both_ are therefore >playing weaker... The problem is going to show up when one program behaves >better with ponder=off than another one... that will exaggerate a difference >that isn't there in real life.. Yes!!! I agree....as long as both play equally as weak with ponder off and as long as both again gain the same with ponder back on. This is what I have found under the *chessbase* interface. I have yet to find a program that will win with ponder off but not win the same with ponder on playing under chessbase. I will not jump to the conclusion that other programs that I have not played are fine testing this way. But as I have lots of games with the chessbase engines under both methods, and here I can see the results match very well using both methods of testing.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.