Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Opinions? A Crafty experiment...

Author: Robert Hyatt
Date: 10:43:57 05/26/04
On May 26, 2004 at 13:01:16, José Carlos wrote:

>On May 26, 2004 at 12:23:08, Robert Hyatt wrote:
>
>>On May 26, 2004 at 05:16:05, José Carlos wrote:
>>
>>>On May 25, 2004 at 20:15:58, Dann Corbit wrote:
>>>
>>>>On May 25, 2004 at 15:12:01, Russell Reagan wrote:
>>>>
>>>>>On May 25, 2004 at 14:33:31, Dann Corbit wrote:
>>>>>
>>>>>>I doubt that very much.  There are some engines that vary in strength with time
>>>>>>control, but it is generally at the blitz level where these transitions take
>>>>>>place.  An engine that scores 30% at G/40 will probably score 30% at G/120 and
>>>>>>at 40/2 against the same opponent.
>>>>>
>>>>>
>>>>>I'll test it. What engines would you like me to use?
>>>>>
>>>>>
>>>>>>I suspect that you saw it happen once or twice and are now extrapolating the
>>>>>>result in your mind.
>>>>>
>>>>>
>>>>>Yes, maybe. I need to test the idea some more.
>>>>>
>>>>>
>>>>>>If the effect were profound, wouldn't Crafty score 50% against Shredder in the
>>>>>>SSDF?
>>>>>
>>>>>
>>>>>I don't understand the reasoning here. The effect may only be subtle. I don't
>>>>>even know if it is testable in practical time.
>>>>>
>>>>>
>>>>>>The reason an engine might pick up strength at longer time controls is that it
>>>>>>has a better fundamental algorithm, but it is poorly microoptimized.
>>>>>
>>>>>
>>>>>What about diminishing returns? If we plotted the results of matches with
>>>>>respect to time (ex. 30%, 35%, 38%, etc.), what do the curves look like? At the
>>>>>beginning of the curve, the slow program with a superior algorithm won't fit the
>>>>>overall pattern, but I'm after the overall shape of the curve, where it levels
>>>>>off (or if it levels off), and things like that.
>>>>
>>>>Why will one program have diminishing returns and not the other?
>>>>There is no conclusive evidence that diminishing returns occur.  Citations"
>>>>"Dark Thought Goes Deep"
>>>>"Crafty Goes Deep"
>>>>
>>>>>>A great painter paints a picture in a month.  The same painter paints a picture
>>>>>>in ten minutes.  I am guessing that the slower time of painting made a much
>>>>>>better picture.
>>>>>>
>>>>>>When I play a chess engine contest, I want the result to be art, not comedy.
>>>>>>For me (though not for the majority) high speed blitz games are a crime against
>>>>>>humanity.
>>>>>>
>>>>>>It is not the end point (who won?) that is interesting to me.  It is the journey
>>>>>>along the way.
>>>>>
>>>>>
>>>>>This is where we differ somewhat. I am not uninterested in the quality of the
>>>>>games, but I am more interested in the outcome of the match and finding out who
>>>>>is better. A G/30 match might be of lower quality, but in general it will
>>>>>probably produce the same winner as a G/120 match, don't you think?
>>>>
>>>>What you will see is how strong the program is on that hardware at G/30.
>>>>Chances are good that there is a correlation to how the program does on that
>>>>hardware ag G/120.
>>>>
>>>>>I am thinking about this from the point of view of an engine developer. If I can
>>>>>reliably tell which engine is stronger in 1/10th of the time, without having to
>>>>>play G/120 matches for weeks, then that will benefit me greatly in finding out
>>>>>whether changes to the engine are improvements, and the engine will improve more
>>>>>quickly.
>>>>
>>>>The higher the speed of the games, the greater the amount of randomness if the
>>>>pace is very fast.  At some point, I think it levels out.
>>>
>>>
>>>  This is an interesting point. I had never thought at it that way. So basically
>>>you say "faster implies more data and more randomness, and that probably levels
>>>out at some point". So an interesting experiment would be: try 1000 games at G1,
>>>100 games at G30 and 10 games at G120. The % of w/d/l should somehow be similar.
>>>Of course the numbers should be calculated in a more elaborated way, I just made
>>>them up, but that's the idea. Do you know how to do the calculations (my
>>>mathematical background is not enough)?
>>>  Or we could do the other way, this is, run 1000 games at G1. Then start a
>>>match at G30 (with at least n games) until results are similiar in % to the
>>>first match. Then do the same with G120.
>>>  What do you think?
>>>
>>>  José C.
>>
>>I think the idea is flawed.
>>
>>Suppose you play two programs and limit them so they can only search to a depth
>>of 1 ply.  It becomes "static evaluation vs static evaluation".  If A has a
>>better evaluation, A wins.
>>
>>Suppose you now search for a long time, but A uses minimax (Just for a gross but
>>impractical example) and B uses alpha/beta.  B will probably win on tactics.
>>Short games favor good evaluation over tactics.  Longer games can give a program
>>a tactical edge over a smarter program...
>>
>>I am _certain_ that Crafty plays worse against the same program at blitz, as
>>opposed to playing the program in standard time controls.  From looking at
>>literally thousands of logs from ICC...
>
>
>  Yes, that's probably true for some programs (though I don't think in short
>games eval is more important than search.

Take this to a boundary condition where you do a 1 ply search and that is _all_.
 All that will select moves is evaluation...  Another example.  A _very_ slow
program that at blitz can only search exactly 1 moves.  While at longer time
controls it can search them all.  There will be a big difference between blitz
and non-blitz results.

I think that writing off the possibility of a program being better at blitz vs
standard (or vice-versa) is the wrong thing to do without a _lot_ of testing to
verify it...


> The opposite seems more logical to
>me). But Dann's point about noise still makes sense to me. Maybe the experiment
>could be done with Tiger (Christophe always claims it plays the same at all time
>controls) versus other program...
>
>  José C.
>
>
>
>
>>>>In a contest, I will spend a lot of time generating data.  I would like the data
>>>>to be valuable to me.
>>>>
>>>>>In that respect, I think longer games tell us less about which engine is better,
>>>>>and about whether a change was really an improvement. I may be wrong though. It
>>>>>is just an idea.
>>>>
>>>>I think that there is probably some happy medium for experimental quality (IOW,
>>>>to collect the most reliable data in the least amount of time).  But it probably
>>>>varies quite a bit from program to program and from machine to machine, etc.
>>>>
>>>>When I generate a chess contest, I want the data to be interesting enough for me
>>>>to read.  Who wins the contest is purely an afterthought for me.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.