Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Opinions? A Crafty experiment...

Author: Robert Hyatt

Date: 11:48:29 05/26/04

Go up one level in this thread


On May 26, 2004 at 14:30:30, Uri Blass wrote:

>On May 26, 2004 at 13:46:51, Robert Hyatt wrote:
>
>>On May 26, 2004 at 12:46:09, Uri Blass wrote:
>>
>>>On May 26, 2004 at 12:23:08, Robert Hyatt wrote:
>>>
>>>>On May 26, 2004 at 05:16:05, José Carlos wrote:
>>>>
>>>>>On May 25, 2004 at 20:15:58, Dann Corbit wrote:
>>>>>
>>>>>>On May 25, 2004 at 15:12:01, Russell Reagan wrote:
>>>>>>
>>>>>>>On May 25, 2004 at 14:33:31, Dann Corbit wrote:
>>>>>>>
>>>>>>>>I doubt that very much.  There are some engines that vary in strength with time
>>>>>>>>control, but it is generally at the blitz level where these transitions take
>>>>>>>>place.  An engine that scores 30% at G/40 will probably score 30% at G/120 and
>>>>>>>>at 40/2 against the same opponent.
>>>>>>>
>>>>>>>
>>>>>>>I'll test it. What engines would you like me to use?
>>>>>>>
>>>>>>>
>>>>>>>>I suspect that you saw it happen once or twice and are now extrapolating the
>>>>>>>>result in your mind.
>>>>>>>
>>>>>>>
>>>>>>>Yes, maybe. I need to test the idea some more.
>>>>>>>
>>>>>>>
>>>>>>>>If the effect were profound, wouldn't Crafty score 50% against Shredder in the
>>>>>>>>SSDF?
>>>>>>>
>>>>>>>
>>>>>>>I don't understand the reasoning here. The effect may only be subtle. I don't
>>>>>>>even know if it is testable in practical time.
>>>>>>>
>>>>>>>
>>>>>>>>The reason an engine might pick up strength at longer time controls is that it
>>>>>>>>has a better fundamental algorithm, but it is poorly microoptimized.
>>>>>>>
>>>>>>>
>>>>>>>What about diminishing returns? If we plotted the results of matches with
>>>>>>>respect to time (ex. 30%, 35%, 38%, etc.), what do the curves look like? At the
>>>>>>>beginning of the curve, the slow program with a superior algorithm won't fit the
>>>>>>>overall pattern, but I'm after the overall shape of the curve, where it levels
>>>>>>>off (or if it levels off), and things like that.
>>>>>>
>>>>>>Why will one program have diminishing returns and not the other?
>>>>>>There is no conclusive evidence that diminishing returns occur.  Citations"
>>>>>>"Dark Thought Goes Deep"
>>>>>>"Crafty Goes Deep"
>>>>>>
>>>>>>>>A great painter paints a picture in a month.  The same painter paints a picture
>>>>>>>>in ten minutes.  I am guessing that the slower time of painting made a much
>>>>>>>>better picture.
>>>>>>>>
>>>>>>>>When I play a chess engine contest, I want the result to be art, not comedy.
>>>>>>>>For me (though not for the majority) high speed blitz games are a crime against
>>>>>>>>humanity.
>>>>>>>>
>>>>>>>>It is not the end point (who won?) that is interesting to me.  It is the journey
>>>>>>>>along the way.
>>>>>>>
>>>>>>>
>>>>>>>This is where we differ somewhat. I am not uninterested in the quality of the
>>>>>>>games, but I am more interested in the outcome of the match and finding out who
>>>>>>>is better. A G/30 match might be of lower quality, but in general it will
>>>>>>>probably produce the same winner as a G/120 match, don't you think?
>>>>>>
>>>>>>What you will see is how strong the program is on that hardware at G/30.
>>>>>>Chances are good that there is a correlation to how the program does on that
>>>>>>hardware ag G/120.
>>>>>>
>>>>>>>I am thinking about this from the point of view of an engine developer. If I can
>>>>>>>reliably tell which engine is stronger in 1/10th of the time, without having to
>>>>>>>play G/120 matches for weeks, then that will benefit me greatly in finding out
>>>>>>>whether changes to the engine are improvements, and the engine will improve more
>>>>>>>quickly.
>>>>>>
>>>>>>The higher the speed of the games, the greater the amount of randomness if the
>>>>>>pace is very fast.  At some point, I think it levels out.
>>>>>
>>>>>
>>>>>  This is an interesting point. I had never thought at it that way. So basically
>>>>>you say "faster implies more data and more randomness, and that probably levels
>>>>>out at some point". So an interesting experiment would be: try 1000 games at G1,
>>>>>100 games at G30 and 10 games at G120. The % of w/d/l should somehow be similar.
>>>>>Of course the numbers should be calculated in a more elaborated way, I just made
>>>>>them up, but that's the idea. Do you know how to do the calculations (my
>>>>>mathematical background is not enough)?
>>>>>  Or we could do the other way, this is, run 1000 games at G1. Then start a
>>>>>match at G30 (with at least n games) until results are similiar in % to the
>>>>>first match. Then do the same with G120.
>>>>>  What do you think?
>>>>>
>>>>>  José C.
>>>>
>>>>I think the idea is flawed.
>>>>
>>>>Suppose you play two programs and limit them so they can only search to a depth
>>>>of 1 ply.  It becomes "static evaluation vs static evaluation".  If A has a
>>>>better evaluation, A wins.
>>>>
>>>>Suppose you now search for a long time, but A uses minimax (Just for a gross but
>>>>impractical example) and B uses alpha/beta.  B will probably win on tactics.
>>>>Short games favor good evaluation over tactics.  Longer games can give a program
>>>>a tactical edge over a smarter program...
>>>>
>>>>I am _certain_ that Crafty plays worse against the same program at blitz, as
>>>>opposed to playing the program in standard time controls.  From looking at
>>>>literally thousands of logs from ICC...
>>>
>>>Crafty against which program?
>>>Is not the answer dependent on the name of the opponent program?
>>>
>>>Uri
>>
>>
>>Not particularly.  But in general I am talking about commercial programs.  You
>>could look at some stats on ICC for example.  It simply seems to play better at
>>longer time controls...
>
>I cannot use search command because I am not member of ICC so I only looked at
>rating.
>
>I see that you have 3241 at bullet 2929 at blitz and 2637 at standard.
>
>
>Bullet      3241  [8]  6675  1468  1088  9231   3286 (27-Dec-2002)
>Blitz       2929      59626 17733 14094 91453   3388 (09-Jun-2000)
>Standard    2637       5281  2747  2442 10470   2792 (25-Oct-2000)
>
>
>For comparison
>
>
>DeepFritz
>
>Bullet      2958  [8]    40     2     5    47   3003 (02-Jan-2002)
>Blitz       3038  [8]    68    11    10    89   3038 (25-Aug-2002)
>Standard    2746  [6]    74    15    20   109   2801 (27-Jan-2001)
>
>
>Rebel12
>
>Bullet      2190  [8]     1     0     1     2
>Blitz       2774       3467  2416  1849  7732   3018 (26-Apr-2004)
>Standard    2551        169   190   133   492   2677 (27-Aug-2003)
>
>
>I do not see a tendency to do better at longer time control relative to the
>commercial based on that data.
>
>Uri


Comps don't play bullet vs crafty, so that is mainly humans and they don't play
it much any longer.  You can't really look at the bullet ratings.  In fact, the
bare ratings don't say much.  Crafty plays mostly comps at standard, where the
blitz rating is a composite from playing both comps and humans.

I think you have to look at results vs individual opponents, for blitz and
standard time controls, to see if my intuition is really correct...

I would also add that bullet ratings are pretty meaningless from a "number"
point of view.  Humans simply can't cope at that speed.  Blitz is hard enough as
a good GM might win 1 of every 10 games, and draw 3-4 of every 10.  That pushes
a program's blitz and bullet ratings into the lunar orbit range.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.