Author: Robert Hyatt
Date: 07:43:12 01/01/98
Go up one level in this thread
On January 01, 1998 at 01:45:12, Don Dailey wrote: >>>Here is how I do SELF testing and some of my thoughts on it. >>> >>>I have 200 very shallow openings and they go to exactly 5 moves for >>>each player or 10 ply. Larry Kaufman picked them so that they are >>>all relatively normal, span a lot of theory but do not get you too >>>deeply into the game (so we test opening heurstics too.) >>> >>>200 openings gives us a total of 400 games, so we try to test in >>>batches of 400 games. >> >>How long does a 400 games match takes to get accurate results? > >I often start with very fast games so I can get lots of results very >quickly. Often a new algorithm will get badly beaten and it will >make no real sense to continue. But when everything seems ok and not >too one sided I migrate to longer and longer games. > >Here is an interesting topic we should discuss: How meaningful is >it to test at much faster time controls that actual tournament time >contols? Here is my sense of this subject: > >At one time I believed it to be very important to do lots of testing >at tournament time controls, after all, that is what you are trying >to optimize. But my opinion on this now is that it is (mostly) a >waste of time. On virtually every test I ever do, I get the same >results on average. I will test levels that vary from 1/2 sec or >less per move up to 2 or 3 minutes per move and the better >algorithm tends to win at ALL levels with about an equal ratio. The >only trend I have every noticed is that really tactical algorithms >like wild move extensions will tend to do much better on really low >levels. But after 3 or 4 ply it does not seem to make a bit of >difference. I have yet to find an algorithm that scores better at >fast time and worse at long (or visa versa the other way around.) >Even with the wild tests it was more like 55% at 2 and 3 play, 51% >at everything higher! > >Now there are a few that SEEM to have this behaviour until I check >it out more thoroughly and get bigger sample sizes and test at even >higher levels. It always turns out to be statistical noise I saw. > >So my current belief is that you need to search deep enough to get >beyond about 3 ply and after this it will not matter much at all. >Even if there is a very slight effect it's cancelled out by the >small sample sizes you are limited to with long games. If I had >to test at tournament time controls it would take 2 months to get >in 300 or 400 games. This is ridiculous and I don't consider it >enough games anyway. I feel like its a fortunate accident that >this kind of testing may not be all that necessary. > >Occasionally I "bite the bullet" and go for some longer testing >but it has yet to show me something I couldn't see with the short >testing. Larry Kaufman once believed (I assume he still does) >that game in 10 or 15 was a very good compromise because he felt >it simulated tournament time controls well, you get most of the >depth (about 2 ply less) with a much reduced investment of time. >Even if there is an odd/even effect it should be the same at >about that time control. He used to test machines using that >time control. > > >-- Don I think the risk you take is a sort of "inbreeding". One example: Crafty is a "null-move" searcher. At shallow depths, null-move can hide simple threats quite easily. I used to see this when playing on ICC using a P5/133 machine. Once I moved to the P6/200 and tuned things better for that machine, skill went *down* on the P5. If I play crafty vs crafty on the P5, I don't see the threat problem because *neither* one sees the threats since their searches are similar (I am assuming I am working on eval changes, but it could also be search changes as well). As a result, results at short and long time controls will be the same. But if I play another program (like Genius for example), at short time controls Crafty will suffer more than at longer time controls, because the longer time controls tend to hide the null-move "weakness" that shallow searches expose quickly. So it is possible to draw conclusions from short games that are wrong, if the short games are A vs A, rather than A vs B...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.