Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Results from the WT-5 tournament

Author: Ed Schröder
Date: 04:03:32 08/30/99
On August 29, 1999 at 18:39:00, Mark Young wrote:

>On August 29, 1999 at 17:12:45, Robert Hyatt wrote:
>
>>On August 29, 1999 at 16:51:02, Mark Young wrote:
>>
>>>On August 29, 1999 at 15:36:50, Robert Hyatt wrote:
>>>
>>>>On August 29, 1999 at 15:04:09, Frank Quisinsky wrote:
>>>>
>>>>>Hello Robert,
>>>>>
>>>>>>>example ...
>>>>>
>>>>>>>Crafty thinking for move 28 in the game
>>>>>>>02:58 13/02 move Ka1 without ponder
>>>>>>>02:20 13/04 move Ka1 with ponder
>>>>>
>>>>>>that makes no sense.  pondering saved 38 seconds?  It should save more like
>>>>>>2 minutes there.
>>>>>
>>>>>An bad example from me, but I mean that when Crafty 2 minutes more time Crafty
>>>>>found in 30% ponder hints not more then 5 avoidable better moves. And this 5
>>>>>moves which play Crafty without ponder must not been bad !
>>>>>
>>>>>And I will say that this is not for an statistic relevant. Bob you can see the
>>>>>rating list from Kai, Christian and me of the new WinBoard site. Crafty play
>>>>>with 2494 ELO and Comet play with 2445 ELO (over 500 games).
>>>>>
>>>>>And when I make an rating list on two PCs I think that Crafty play with ~ 2500
>>>>>ELO and Comet with ~ 2450 ELO + 20-40 for ponder !
>>>>>
>>>>>And when Comet the time control better use then Crafty play Comet with 2440 ElO
>>>>>and Crafty with 2500 ELO on one PC ! Or will you say that Crafty play more than
>>>>>50 ELO better then Comet on one PC or better than 80 ELO by AnMon, looked in the
>>>>>ratinglist from Kai, Christian and me ?
>>>>>
>>>>
>>>>You can believe what you want, and play matches any way you want.  I simply
>>>>told you that the way you are playing them is non-optimal.  Ed said the same
>>>>thing.  If you think you know my program better than I do, that's fine.  I
>>>>simply say that if you play crafty with ponder=off, you hurt it in ways you
>>>>do _not_ understand.  Some other programs may be hurt in the same way.  Some
>>>>may not.  When you mix a program that is hurt by this with one that is not,
>>>>the results get skewed.
>>>>
>>>>It _does_ affect Crafty.  That I an _certain_ of.  Other programs I have no
>>>>idea about, other than Ed said it hurts Rebel as well...
>>>>
>>>>
>>>>
>>>>
>>>>>>>In move 29 in this game
>>>>>>>04:45 11/04 move Ka2 without ponder
>>>>>>>05:38 11/05 move Ka2 with ponder
>>>>>>
>>>>>>ditto...  it depends on how long the opponent thinks _after_ crafty
>>>>>>starts pondering...  If it thinks for the normal amount of time, crafty
>>>>>>gets that much think-time _free_.  And I've _never_ seen the prediction
>>>>>>rate below 50% against a computer, more commonly it is well above 50%.
>>>>>>The log file will show how many moves it correctly predicted, which will
>>>>>>tell how many times it could potentially save time.
>>>>>>
>>>>>>But you are totally missing the point Ed raised and I seconded:  if one
>>>>>>program has been tested and tuned for ponder=off play, and the other has
>>>>>>not, then that program has a significant advantage.  Tough luck, you say?
>>>>>>Of course... but then your results don't have anything to do with how the
>>>>>>two programs would perform on separate machines.
>>>>>
>>>>>Yes I see that problem Robert. And I must say this is all correct what you
>>>>>writing !
>>>>>
>>>>>But you think ponder make 50-100 and the time control for matches on one machine
>>>>>is bad (I mean, you are the programmer and you can this say) but I think ponder
>>>>>is 20-40 ELO and I see not time problems in Crafty when I looked this matches
>>>>>with longer time control. The engine which had an better time control for
>>>>>matches on one PC had an minmal advantage, I think 10 ELO. This advantage is not
>>>>>relevant.
>>>>>
>>>>>>That is why we keep saying "don't run games on one computer...  the results
>>>>>>are not always as meaningful as you might assume..."
>>>>>
>>>>>And I say play matches on one Computer than the results are for a statistic very
>>>>>good. And I am happy when user play tournament with Winboard and send me this
>>>>>data for the homepage from volker and me :-))
>>>>>
>>>>>>you are missing the point.  my time allocation _depends_ on saving time by
>>>>>>pondering.  You are not allowing it to do that.  Which is the problem with
>>>>>>this...  nobody would argue that _all_ engines are 50-100 elo stronger with
>>>>>>ponder=on than they are with ponder=off.  That is easily testable on a chess
>>>>>>server.  But the issue here is whether a program is tested with ponder=off or
>>>>>>not.  Mine isn't.  Ed's isn't.
>>>>>
>>>>>No I see this point !
>>>>>And I will not say no when the programmer say yes. I will not so discussion. But
>>>>>Robert in this point I see not 50-100 ELO, when Crafty play with an good time
>>>>>control under WinBoard.
>>>>>
>>>>>And another point is all engines, yes !
>>>>>
>>>>>OK what can an programmer make with ponder. Ponder is ponder. Programm A found
>>>>>the best moves in 10 seconds and play this moves in 3 minutes and programm B
>>>>>found the move in 3 minutes and play this move with ponder. Then had programm B
>>>>>an advantage ! And another advantage for ponder, learning ?
>>>>>
>>>>
>>>>
>>>>
>>>>You are _still_ overlooking the point.  When crafty ponders, it builds up a
>>>>time 'surplus'.  It can use this in creative ways, to either search longer
>>>>when the position is unclear, or when the eval drops.  If it doesn't have this
>>>>'surplus' then it doesn't do these things in the same way.  And with no
>>>>pondering, it won't ever have a surplus.  Other assumptions made in the time
>>>>allocation are also incorrect with no pondering...
>>>>
>>>>So it isn't _just_ finding a better move when it ponders correctly that is the
>>>>issue here.. It is the _time saved_ on such moves that then influences _other_
>>>>moves in the game...  those you are ignoring..
>>>>
>>>>>And Server ...
>>>>>This is right, on Server the most games are blitz games. And here is ponder at
>>>>>the moment importent.
>>>>>
>>>>>>generally 2x faster is 70 Elo better.  Pondering has the potential to make
>>>>>>a program act like it is twice as fast...
>>>>>
>>>>>Is this gereally 2xfaster 70 ELO better ?
>>>>>
>>>>>In the last years I think !
>>>>>
>>>>>You say with this statement ...
>>>>>
>>>>>AMD K6-3  450 2500 ELO
>>>>>AMD K6-3  900 2570 ELO
>>>>>AMD K6-3 1800 2640 ELO
>>>>>AMD K6-3 3600 2710 ELO
>>>>>
>>>>>I think when Crafty on an AMD K6-3 450 play with 2500 ELO and come in Ply 13
>>>>>(tournament play) the AMD K6-3 with 3600 come not in play 18 for 2700 ELO !!!!
>>>>>
>>>>
>>>>your math is bad.  going from 450 to 3600 gets at most 2 plies.  It takes a
>>>>factor of 3x roughly to get another ply.  10x faster is roughly two plies
>>>>deeper.
>>>>
>>>>And the 70 Elo works..  because the "Elo" we are talking about is _not_
>>>>the performance against humans, it is the performance between two identical
>>>>programs but one running 2x faster.  And that 2x faster program will win a
>>>>bunch more games, yet against humans the difference won't be nearly as
>>>>dramatic...
>>>>
>>>>
>>>>
>>>>>The AMD K6-3 with 3600 MHz come Crafty in Ply 15 and play with 2625 ELO !
>>>>>
>>>>>>But suppose you take his car, and suddenly make him run with rain tires when he
>>>>>>hasn't in the past.  How do you think he'd do then?  No testing?  He'd be pretty
>>>>>>unlikely to even finish the race.  This is a common NASCAR problem in the USA.
>>>>>>There are many good rain tires, and some NASCAR races are on wet tracks.  But
>>>>>>the drivers don't use the rain tires because to quote one this week "It would
>>>>>>be on-the-job-training, because we can't have rain when we need it to test..."
>>>>>>
>>>>>>That is the point with chess.  You are testing the programs in a mode where _we_
>>>>>>don't test them.  Poor performance is not unexpected...
>>>>>
>>>>>Yes this is an good example :-))
>>>>>
>>>>>OK Bob, I play with many chess programs and I have play with two computers and
>>>>>with one computer. My ELO is not so big than I can say it is 20-40 ELO, but I
>>>>>can see that the programs with ponder not play more than 5 another moves in the
>>>>>games. And this 5 moves which the engines play without ponder are not bad. So I
>>>>>will say that this is not importent for an statistic.
>>>>>
>>>>>Kind regards
>>>>>Frank
>>>>
>>>>
>>>>Just note that I pointed out that you are looking _only_ at the moves that
>>>>were pondered correctly.  The time saved affects _every other move_ in the
>>>>game in different ways.  If you play thru the whole game with 2x the time per
>>>>move, you will find many places where it would have changed its mind if it had
>>>>had a little more time, which it would have had had pondering been enabled...
>>>
>>>I will not argue that not pondering changes a programs move selection. That is
>>>only logical. What is uncertain is will the change in a few moves changes the
>>>outcome of the games in a one computer engine vs engine test. The data I
>>>generated says no, the other data I have seen says no. I can only conclude at
>>>this time the change is not a much as you imagine for what ever reason that may
>>>be. And for sure that change is well below 50 elo points.
>>>
>>>Q: If the change in results is 50 to 100 elo points why are we not seeing this
>>>change in our results between the one-computer test and the tests run on two
>>>computers?
>>>
>>>You do not need hundreds of games to see a change that big.
>>
>>Because you are playing _both_ programs with ponder=off.  _Both_ are therefore
>>playing weaker...  The problem is going to show up when one program behaves
>>better with ponder=off than another one... that will exaggerate a difference
>>that isn't there in real life..
>
>Yes!!! I agree....as long as both play equally as weak with ponder off and as
>long as both again gain the same with ponder back on. This is what I have found
>under the *chessbase* interface. I have yet to find a program that will win with
>ponder off but not win the same with ponder on playing under chessbase.
>
>I will not jump to the conclusion that other programs that I have not played are
>fine testing this way. But as I have lots of games with the chessbase engines
>under both methods, and here I can see the results match very well using both
>methods of testing.

But think of this which IMO is the whole point of the discussion:

Results of engine-engine matches on 1 PC
----------------------------------------
Program_X   40 points  TPR 2600
Program_Y   36 points  TPR 2550
Program_Z   32 points  TPR 2500

and

Results of engine-engine matches on 2 PC's
------------------------------------------
Program_Y   40 points  TPR 2600
Program_Z   36 points  TPR 2550
Program_X   32 points  TPR 2500

This is a not so unlikely scenario, all because of time control and
the lack of the permanent brain.

And which one would be considered as the more valuable one?

Ed
Re: Results from the WT-5 tournament (and for Robert an Sorry) Frank Quisinsky 07:00:24 08/30/99
- Re: Results from the WT-5 tournament (PGN and log file, long message) Frank Quisinsky 07:52:32 08/30/99
  - Re: Results from the WT-5 tournament (PGN and log file, long message) Frank Quisinsky 13:03:11 08/30/99
    - Re: Results from the WT-5 tournament (PGN and log file, long message) Robert Hyatt 13:45:27 08/30/99
      - Re: Results from the WT-5 tournament (PGN and log file, long message) Frank Quisinsky 14:15:40 08/30/99
This page took 0.03 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.