Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Junior's long lines: more data about this....

Author: Robert Hyatt

Date: 07:43:12 01/01/98

Go up one level in this thread


On January 01, 1998 at 01:45:12, Don Dailey wrote:

>>>Here is how I do SELF testing and some of my thoughts on it.
>>>
>>>I have 200 very shallow openings and they go to exactly 5 moves for
>>>each player or 10 ply.  Larry Kaufman picked them so that they are
>>>all relatively normal, span a lot of theory but do not get you too
>>>deeply into the game (so we test opening heurstics too.)
>>>
>>>200 openings gives us a total of 400 games, so we try to test in
>>>batches of 400 games.
>>
>>How long does a 400 games match takes to get accurate results?
>
>I often start with very fast games so I can get lots of results very
>quickly.   Often a new algorithm will get badly beaten and it will
>make no real sense to continue.  But when everything seems ok and not
>too one sided I migrate to longer and longer games.
>
>Here is an interesting topic we should discuss:  How meaningful is
>it to test at much faster time controls that actual tournament time
>contols?  Here is my sense of this subject:
>
>At one time I believed it to be very important to do lots of testing
>at tournament time controls, after all, that is what you are trying
>to optimize.   But my opinion on this now is that it is (mostly) a
>waste of time.   On virtually every test I ever do, I get the same
>results on average.   I will test levels that vary from 1/2 sec or
>less per move up to 2 or 3 minutes per move and the better
>algorithm tends to win at ALL levels with about an equal ratio.  The
>only trend I have every noticed is that really tactical algorithms
>like wild move extensions will tend to do much better on really low
>levels.  But after 3 or 4 ply it does not seem to make a bit of
>difference.  I have yet to find an algorithm that scores better at
>fast time and worse at long (or visa versa the other way around.)
>Even with the wild tests it was more like 55% at 2 and 3 play, 51%
>at everything higher!
>
>Now there are a few that SEEM to have this behaviour until I check
>it out more thoroughly and get bigger sample sizes and test at even
>higher levels.  It always turns out to be statistical noise I saw.
>
>So my current belief is that you need to search deep enough to get
>beyond about 3 ply and after this it will not matter much at all.
>Even if there is a very slight effect it's cancelled out by the
>small sample sizes you are limited to with long games.  If I had
>to test at tournament time controls it would take 2 months to get
>in 300 or 400 games.  This is ridiculous and I don't consider it
>enough games anyway.  I feel like its a fortunate accident that
>this kind of testing may not be all that necessary.
>
>Occasionally I "bite the bullet" and go for some longer testing
>but it has yet to show me something I couldn't see with the short
>testing.   Larry Kaufman once believed (I assume he still does)
>that game in 10 or 15 was a very good compromise because he felt
>it simulated tournament time controls well, you get most of the
>depth (about 2 ply less) with a much reduced investment of time.
>Even if there is an odd/even effect it should be the same at
>about that time control.  He used to test machines using that
>time control.
>
>
>-- Don

I think the risk you take is a sort of "inbreeding".  One example:

Crafty is a "null-move" searcher.  At shallow depths, null-move can hide
simple threats quite easily.  I used to see this when playing on ICC
using
a P5/133 machine.  Once I moved to the P6/200 and tuned things better
for
that machine, skill went *down* on the P5.

If I play crafty vs crafty on the P5, I don't see the threat problem
because
*neither* one sees the threats since their searches are similar (I am
assuming
I am working on eval changes, but it could also be search changes as
well).
As a result, results at short and long time controls will be the same.
But
if I play another program (like Genius for example), at short time
controls
Crafty will suffer more than at longer time controls, because the longer
time controls tend to hide the null-move "weakness" that shallow
searches
expose quickly.

So it is possible to draw conclusions from short games that are wrong,
if
the short games are A vs A, rather than A vs B...



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.