Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Results from the WT-5 tournament

Author: Ed Schröder

Date: 04:03:32 08/30/99

Go up one level in this thread


On August 29, 1999 at 18:39:00, Mark Young wrote:

>On August 29, 1999 at 17:12:45, Robert Hyatt wrote:
>
>>On August 29, 1999 at 16:51:02, Mark Young wrote:
>>
>>>On August 29, 1999 at 15:36:50, Robert Hyatt wrote:
>>>
>>>>On August 29, 1999 at 15:04:09, Frank Quisinsky wrote:
>>>>
>>>>>Hello Robert,
>>>>>
>>>>>>>example ...
>>>>>
>>>>>>>Crafty thinking for move 28 in the game
>>>>>>>02:58 13/02 move Ka1 without ponder
>>>>>>>02:20 13/04 move Ka1 with ponder
>>>>>
>>>>>>that makes no sense.  pondering saved 38 seconds?  It should save more like
>>>>>>2 minutes there.
>>>>>
>>>>>An bad example from me, but I mean that when Crafty 2 minutes more time Crafty
>>>>>found in 30% ponder hints not more then 5 avoidable better moves. And this 5
>>>>>moves which play Crafty without ponder must not been bad !
>>>>>
>>>>>And I will say that this is not for an statistic relevant. Bob you can see the
>>>>>rating list from Kai, Christian and me of the new WinBoard site. Crafty play
>>>>>with 2494 ELO and Comet play with 2445 ELO (over 500 games).
>>>>>
>>>>>And when I make an rating list on two PCs I think that Crafty play with ~ 2500
>>>>>ELO and Comet with ~ 2450 ELO + 20-40 for ponder !
>>>>>
>>>>>And when Comet the time control better use then Crafty play Comet with 2440 ElO
>>>>>and Crafty with 2500 ELO on one PC ! Or will you say that Crafty play more than
>>>>>50 ELO better then Comet on one PC or better than 80 ELO by AnMon, looked in the
>>>>>ratinglist from Kai, Christian and me ?
>>>>>
>>>>
>>>>You can believe what you want, and play matches any way you want.  I simply
>>>>told you that the way you are playing them is non-optimal.  Ed said the same
>>>>thing.  If you think you know my program better than I do, that's fine.  I
>>>>simply say that if you play crafty with ponder=off, you hurt it in ways you
>>>>do _not_ understand.  Some other programs may be hurt in the same way.  Some
>>>>may not.  When you mix a program that is hurt by this with one that is not,
>>>>the results get skewed.
>>>>
>>>>It _does_ affect Crafty.  That I an _certain_ of.  Other programs I have no
>>>>idea about, other than Ed said it hurts Rebel as well...
>>>>
>>>>
>>>>
>>>>
>>>>>>>In move 29 in this game
>>>>>>>04:45 11/04 move Ka2 without ponder
>>>>>>>05:38 11/05 move Ka2 with ponder
>>>>>>
>>>>>>ditto...  it depends on how long the opponent thinks _after_ crafty
>>>>>>starts pondering...  If it thinks for the normal amount of time, crafty
>>>>>>gets that much think-time _free_.  And I've _never_ seen the prediction
>>>>>>rate below 50% against a computer, more commonly it is well above 50%.
>>>>>>The log file will show how many moves it correctly predicted, which will
>>>>>>tell how many times it could potentially save time.
>>>>>>
>>>>>>But you are totally missing the point Ed raised and I seconded:  if one
>>>>>>program has been tested and tuned for ponder=off play, and the other has
>>>>>>not, then that program has a significant advantage.  Tough luck, you say?
>>>>>>Of course... but then your results don't have anything to do with how the
>>>>>>two programs would perform on separate machines.
>>>>>
>>>>>Yes I see that problem Robert. And I must say this is all correct what you
>>>>>writing !
>>>>>
>>>>>But you think ponder make 50-100 and the time control for matches on one machine
>>>>>is bad (I mean, you are the programmer and you can this say) but I think ponder
>>>>>is 20-40 ELO and I see not time problems in Crafty when I looked this matches
>>>>>with longer time control. The engine which had an better time control for
>>>>>matches on one PC had an minmal advantage, I think 10 ELO. This advantage is not
>>>>>relevant.
>>>>>
>>>>>>That is why we keep saying "don't run games on one computer...  the results
>>>>>>are not always as meaningful as you might assume..."
>>>>>
>>>>>And I say play matches on one Computer than the results are for a statistic very
>>>>>good. And I am happy when user play tournament with Winboard and send me this
>>>>>data for the homepage from volker and me :-))
>>>>>
>>>>>>you are missing the point.  my time allocation _depends_ on saving time by
>>>>>>pondering.  You are not allowing it to do that.  Which is the problem with
>>>>>>this...  nobody would argue that _all_ engines are 50-100 elo stronger with
>>>>>>ponder=on than they are with ponder=off.  That is easily testable on a chess
>>>>>>server.  But the issue here is whether a program is tested with ponder=off or
>>>>>>not.  Mine isn't.  Ed's isn't.
>>>>>
>>>>>No I see this point !
>>>>>And I will not say no when the programmer say yes. I will not so discussion. But
>>>>>Robert in this point I see not 50-100 ELO, when Crafty play with an good time
>>>>>control under WinBoard.
>>>>>
>>>>>And another point is all engines, yes !
>>>>>
>>>>>OK what can an programmer make with ponder. Ponder is ponder. Programm A found
>>>>>the best moves in 10 seconds and play this moves in 3 minutes and programm B
>>>>>found the move in 3 minutes and play this move with ponder. Then had programm B
>>>>>an advantage ! And another advantage for ponder, learning ?
>>>>>
>>>>
>>>>
>>>>
>>>>You are _still_ overlooking the point.  When crafty ponders, it builds up a
>>>>time 'surplus'.  It can use this in creative ways, to either search longer
>>>>when the position is unclear, or when the eval drops.  If it doesn't have this
>>>>'surplus' then it doesn't do these things in the same way.  And with no
>>>>pondering, it won't ever have a surplus.  Other assumptions made in the time
>>>>allocation are also incorrect with no pondering...
>>>>
>>>>So it isn't _just_ finding a better move when it ponders correctly that is the
>>>>issue here.. It is the _time saved_ on such moves that then influences _other_
>>>>moves in the game...  those you are ignoring..
>>>>
>>>>>And Server ...
>>>>>This is right, on Server the most games are blitz games. And here is ponder at
>>>>>the moment importent.
>>>>>
>>>>>>generally 2x faster is 70 Elo better.  Pondering has the potential to make
>>>>>>a program act like it is twice as fast...
>>>>>
>>>>>Is this gereally 2xfaster 70 ELO better ?
>>>>>
>>>>>In the last years I think !
>>>>>
>>>>>You say with this statement ...
>>>>>
>>>>>AMD K6-3  450 2500 ELO
>>>>>AMD K6-3  900 2570 ELO
>>>>>AMD K6-3 1800 2640 ELO
>>>>>AMD K6-3 3600 2710 ELO
>>>>>
>>>>>I think when Crafty on an AMD K6-3 450 play with 2500 ELO and come in Ply 13
>>>>>(tournament play) the AMD K6-3 with 3600 come not in play 18 for 2700 ELO !!!!
>>>>>
>>>>
>>>>your math is bad.  going from 450 to 3600 gets at most 2 plies.  It takes a
>>>>factor of 3x roughly to get another ply.  10x faster is roughly two plies
>>>>deeper.
>>>>
>>>>And the 70 Elo works..  because the "Elo" we are talking about is _not_
>>>>the performance against humans, it is the performance between two identical
>>>>programs but one running 2x faster.  And that 2x faster program will win a
>>>>bunch more games, yet against humans the difference won't be nearly as
>>>>dramatic...
>>>>
>>>>
>>>>
>>>>>The AMD K6-3 with 3600 MHz come Crafty in Ply 15 and play with 2625 ELO !
>>>>>
>>>>>>But suppose you take his car, and suddenly make him run with rain tires when he
>>>>>>hasn't in the past.  How do you think he'd do then?  No testing?  He'd be pretty
>>>>>>unlikely to even finish the race.  This is a common NASCAR problem in the USA.
>>>>>>There are many good rain tires, and some NASCAR races are on wet tracks.  But
>>>>>>the drivers don't use the rain tires because to quote one this week "It would
>>>>>>be on-the-job-training, because we can't have rain when we need it to test..."
>>>>>>
>>>>>>That is the point with chess.  You are testing the programs in a mode where _we_
>>>>>>don't test them.  Poor performance is not unexpected...
>>>>>
>>>>>Yes this is an good example :-))
>>>>>
>>>>>OK Bob, I play with many chess programs and I have play with two computers and
>>>>>with one computer. My ELO is not so big than I can say it is 20-40 ELO, but I
>>>>>can see that the programs with ponder not play more than 5 another moves in the
>>>>>games. And this 5 moves which the engines play without ponder are not bad. So I
>>>>>will say that this is not importent for an statistic.
>>>>>
>>>>>Kind regards
>>>>>Frank
>>>>
>>>>
>>>>Just note that I pointed out that you are looking _only_ at the moves that
>>>>were pondered correctly.  The time saved affects _every other move_ in the
>>>>game in different ways.  If you play thru the whole game with 2x the time per
>>>>move, you will find many places where it would have changed its mind if it had
>>>>had a little more time, which it would have had had pondering been enabled...
>>>
>>>I will not argue that not pondering changes a programs move selection. That is
>>>only logical. What is uncertain is will the change in a few moves changes the
>>>outcome of the games in a one computer engine vs engine test. The data I
>>>generated says no, the other data I have seen says no. I can only conclude at
>>>this time the change is not a much as you imagine for what ever reason that may
>>>be. And for sure that change is well below 50 elo points.
>>>
>>>Q: If the change in results is 50 to 100 elo points why are we not seeing this
>>>change in our results between the one-computer test and the tests run on two
>>>computers?
>>>
>>>You do not need hundreds of games to see a change that big.
>>
>>Because you are playing _both_ programs with ponder=off.  _Both_ are therefore
>>playing weaker...  The problem is going to show up when one program behaves
>>better with ponder=off than another one... that will exaggerate a difference
>>that isn't there in real life..
>
>Yes!!! I agree....as long as both play equally as weak with ponder off and as
>long as both again gain the same with ponder back on. This is what I have found
>under the *chessbase* interface. I have yet to find a program that will win with
>ponder off but not win the same with ponder on playing under chessbase.
>
>I will not jump to the conclusion that other programs that I have not played are
>fine testing this way. But as I have lots of games with the chessbase engines
>under both methods, and here I can see the results match very well using both
>methods of testing.

But think of this which IMO is the whole point of the discussion:

Results of engine-engine matches on 1 PC
----------------------------------------
Program_X   40 points  TPR 2600
Program_Y   36 points  TPR 2550
Program_Z   32 points  TPR 2500

and

Results of engine-engine matches on 2 PC's
------------------------------------------
Program_Y   40 points  TPR 2600
Program_Z   36 points  TPR 2550
Program_X   32 points  TPR 2500

This is a not so unlikely scenario, all because of time control and
the lack of the permanent brain.

And which one would be considered as the more valuable one?

Ed



This page took 0.03 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.