Author: James T. Walker
Date: 19:59:20 09/14/00
Go up one level in this thread
On September 14, 2000 at 20:07:27, Robert Hyatt wrote: >On September 14, 2000 at 12:51:50, James T. Walker wrote: > >>At the risk of being on the wrong side of programmers, I have to agree with >>Enrique. I think programmers, especially commercial programmers put a lot of >>emphasis on the World Championship because of the extra sales it might bring. >>In my opinion it is just another tournament with a lot at stake. Testing >>programs like the SSDF does is a more "real world" situation. Still the best >>test of all would be hundreds of games vs humans. But even better than that is >>that you test it yourself and decide if it does what YOU want. >>Jim Walker > > >It really isn't "real world". IE when you buy a new engine to play against, >how often do you play hundreds of games? The SSDF test method tests one >particular aspect of an engine quite well: how will it adapt after it wins >or loses a game? But if you are playing just one game against Kasparov, that >aspect is totally unimportant. If you are playing in a WMCCC event, you might >well spend a lot of time preparing book lines to kill your opponents. I don't >have time for this so I usually try to prepare lines to take the opponent out >of book before his book can kill me (I didn't have time for this in the recent >WMCCC event). So the WMCCC event tests programmer preparation more than any >other factor. I think ICC is the _hardest_ test to pass. You will play >humans at IM and GM strength, and they will play hundreds of games, trying to >crack your book, or find a positional weakness they can exploit, and then they >will do so over and over until you fix it. Opening preparation is no good >there if you play 100% automatically. You _must_ have some randomness or you >get killed no matter how good your "good lines" are. > >I think it is a question of 'benchmarking'. When someone asks _me_ which >processor to buy, I don't randomly say "buy Intel" or "buy AMD". I _always_ >say "benchmark the software you want to run and use the result to choose." >Because different programs respond differently to different processors. And >the processor you get the best results on won't necessarily be the one I get >the best results on. Similarly, you should test an engine in the environment >you expect to use it in. If you want to annotate/analyze your games for errors, >who cares how its "learning" works? If you want to run automatically on the >servers using one of the new auto-interfaces that are available, you had >_better_ have learning facilities or you get cooked, but good. If your goal is >to troll around ICC trying to produce the most inflated rating you can, you >should probably choose the program at the top of the SSDF list (for the record, >I dislike ICC computer accounts that run multiple programs.... it is not a >reasonable thing to do under one account, any more than it would be reasonable >to have a GM, and IM and a patzer all playing using one handle.) If your >intent is to find the strongest opponent for yourself or to use against a human, >then the SSDF list is not the place to look. > >That's about as simply as I can explain what ought to be going on when someone >looks at the various results. Just because a program is on top of the SSDF does >_not_ mean it will do the best against GM players. Just because a program does >better than others against GM players does not mean it will do better on the >SSDF list. And you can add any sub-combinations of the above that you like >to the discussion. Some SSDF games are played in long matches where learning >is critical. Some are played as single games where learning is not used at >all. Learning doesn't flow across two different testers either. Ditto on the >chess servers and at WMCCC events... > >All very cloudy, IMHO. Hard to see the facts through the heavy "fog"... Hello Bob, Boy you jumped all over my SDDF statement ! :-) If you read my post again you will see that in order of what I consider important SSDF although mentioned first is actually the last choice. I agree it's very limited in it's usefullness. But compared to playing 7-10 games in a tournament it is way ahead mostly because of the reasons you mention. My second choice was of course hundreds of games vs humans (Like on ICC or top level tournaments). My "best" choice was do your own testing to see if it does what you want it to do. That could be blitz vs other computers,long games vs computers, overnight analysis, postal type analysis, process epd files etc. etc. etc. So I don't think we disagree here at all. By the way I sent you a message on ICC and never got an answer so here it is again. I recently downloaded Crafty 17.13 and ran 32 games vs Fritz. The score was something like 27.5 - 4.5 favor Fritz. My question was have you done something to Crafty to make it weaker at blitz in order to improve it at longer time controls? (I know it's a small sample) Jim
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.