Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: WCCC vs auto232

Author: Robert Hyatt
Date: 20:19:04 09/14/00
On September 14, 2000 at 22:59:20, James T. Walker wrote:

>On September 14, 2000 at 20:07:27, Robert Hyatt wrote:
>
>>On September 14, 2000 at 12:51:50, James T. Walker wrote:
>>
>>>At the risk of being on the wrong side of programmers, I have to agree with
>>>Enrique.  I think programmers, especially commercial programmers put a lot of
>>>emphasis on the World Championship because of the extra sales it might bring.
>>>In my opinion it is just another tournament with a lot at stake.  Testing
>>>programs like the SSDF does is a more "real world" situation.  Still the best
>>>test of all would be hundreds of games vs humans.  But even better than that is
>>>that you test it yourself and decide if it does what YOU want.
>>>Jim Walker
>>
>>
>>It really isn't "real world".  IE when you buy a new engine to play against,
>>how often do you play hundreds of games?  The SSDF test method tests one
>>particular aspect of an engine quite well:  how will it adapt after it wins
>>or loses a game?  But if you are playing just one game against Kasparov, that
>>aspect is totally unimportant.  If you are playing in a WMCCC event, you might
>>well spend a lot of time preparing book lines to kill your opponents.  I don't
>>have time for this so I usually try to prepare lines to take the opponent out
>>of book before his book can kill me (I didn't have time for this in the recent
>>WMCCC event).  So the WMCCC event tests programmer preparation more than any
>>other factor.  I think ICC is the _hardest_ test to pass.  You will play
>>humans at IM and GM strength, and they will play hundreds of games, trying to
>>crack your book, or find a positional weakness they can exploit, and then they
>>will do so over and over until you fix it.  Opening preparation is no good
>>there if you play 100% automatically.  You _must_ have some randomness or you
>>get killed no matter how good your "good lines" are.
>>
>>I think it is a question of 'benchmarking'.  When someone asks _me_ which
>>processor to buy, I don't randomly say "buy Intel" or "buy AMD".  I _always_
>>say "benchmark the software you want to run and use the result to choose."
>>Because different programs respond differently to different processors.  And
>>the processor you get the best results on won't necessarily be the one I get
>>the best results on.  Similarly, you should test an engine in the environment
>>you expect to use it in.  If you want to annotate/analyze your games for errors,
>>who cares how its "learning" works?  If you want to run automatically on the
>>servers using one of the new auto-interfaces that are available, you had
>>_better_ have learning facilities or you get cooked, but good.  If your goal is
>>to troll around ICC trying to produce the most inflated rating you can, you
>>should probably choose the program at the top of the SSDF list (for the record,
>>I dislike ICC computer accounts that run multiple programs....  it is not a
>>reasonable thing to do under one account, any more than it would be reasonable
>>to have a GM, and IM and a patzer all playing using one handle.)  If your
>>intent is to find the strongest opponent for yourself or to use against a human,
>>then the SSDF list is not the place to look.
>>
>>That's about as simply as I can explain what ought to be going on when someone
>>looks at the various results.  Just because a program is on top of the SSDF does
>>_not_ mean it will do the best against GM players.  Just because a program does
>>better than others against GM players does not mean it will do better on the
>>SSDF list.  And you can add any sub-combinations of the above that you like
>>to the discussion.  Some SSDF games are played in long matches where learning
>>is critical.  Some are played as single games where learning is not used at
>>all.  Learning doesn't flow across two different testers either.  Ditto on the
>>chess servers and at WMCCC events...
>>
>>All very cloudy, IMHO.  Hard to see the facts through the heavy "fog"...
>
>
>Hello Bob,
>Boy you jumped all over my SDDF statement ! :-)  If you read my post again you
>will see that in order of what I consider important SSDF although mentioned
>first is actually the last choice.  I agree it's very limited in it's
>usefullness.  But compared to playing 7-10 games in a tournament it is way ahead
>mostly because of the reasons you mention.


I don't think it is that clear.  IE is "crafty" the crafty I show up at a
WMCCC event with?  Or is it the one that is available to everybody?  Because
when I go to a WMCCC (or any computer chess event) I try to at least have
some controls on the openings that I would _not_ put in place on a chess
server, ever.  Which means that even though the "engine" is identically the
same, the tournament results can be drastically different due to opening lines
that are chosen by me vs someone else.

It has been shown that killer book lines happen in SSDF matches.  It has been
shown that they happen in computer chess events.  Using the infamous Nunn match
conditions to compare engines is no better, because those are pre-chosen
positions that one program might be better tuned for than the other.  So that
can bias the results.

Perhaps there isn't an "optimal" way to answer the question "who is best?"
without playing so many zillions of games that the book cooking can't be
used as learning can begin to avoid bad openings pretty quickly if done
right.



>  My second choice was of course
>hundreds of games vs humans (Like on ICC or top level tournaments).  My "best"
>choice was do your own testing to see if it does what you want it to do.  That
>could be blitz vs other computers,long games vs computers, overnight analysis,
>postal type analysis, process epd files etc. etc. etc.  So I don't think we
>disagree here at all.

I agree that we agree. :)



>By the way I sent you a message on ICC and never got an answer so here it is
>again.  I recently downloaded Crafty 17.13 and ran 32 games vs Fritz.  The score
>was something like 27.5 - 4.5 favor Fritz.  My question was have you done
>something to Crafty to make it weaker at blitz in order to improve it at longer
>time controls? (I know it's a small sample)
>Jim


I can't imagine doing anything that would make it play that badly, unless you
are using one computer and the hardware is pretty slow.  I don't do anything
to constrain my null move search (R varies from 3 down to 2 inside the tree).
And shallow searches will expose the holes that leaves open.  So I haven't
done anything to make it weaker, I have simply paid more attention to games
played at longer time controls.  Which, in a way, could "let" it play weaker
since I am not paying much attention to quick games vs computers.  I know how
to make it play stronger blitz games vs computers.  But that is contrary to
what is needed to play endgames against GM players and win them most of the
time.  IE if you want, I can show you some games played by the "top" programs
that show where they fail...  they can be great at tactics, but have some very
important basic chess knowledge totally missing.  Mostly dealing with endgames.

IE you have a queen, I have a queen, I have an a pawn and a g pawn, you have a
g pawn.  Will you try to trade queens?  I don't think so.  Yet a well-known
program will do it in a heartbeat, showing it doesn't understand a basic
endgame principle.  Adding such knowledge robs tactical skills.  And lowers
blitz performance where your opponent can then often find a tactical kill
before his knowledge drops him into a strategic loss.

I'm letting the hardware slowly cover the tactical holes, because I know that
it will never cover the strategic holes, which is where I come in...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.