Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to judge?

Author: Robert Hyatt

Date: 15:20:27 12/27/99

Go up one level in this thread


On December 27, 1999 at 16:01:24, Vincent Diepeveen wrote:

>On December 27, 1999 at 15:38:11, Dann Corbit wrote:
>
>>On December 27, 1999 at 14:57:09, Ed Schröder wrote:
>>[snip]
>>How to know which is best?
>>
>>I think Dr. Hyatt's approach is a good one -- play a bazillion games on the net
>>against quality opponents.  I see that Chris W. and Vincent D. have also
>>followed this strategy.  Since the new improvements to Rebel also allow this
>
>Partly true and partly wrong.
>
>Dead wrong conclusion would be that i improve my program in order to blitz
>better.
>A few blitz game shows bugs in evaluation. however i feel blitz is
>not relevant for my engine to measure it strength at these days.
>The general problem of blitz games is that they go too fast.
>I can't examine a 100 games each days!
>
>I feel standard rated is a lot more important. However Bob success is so
>massive that there play hundreds of crafties out there. So rating
>is so much dependant upon how diep scores against the current crafty version
>that it's hard to sometimes draw conclusions. Some people running
>diep at icc especially for this reason put !computer therefore.
>
>Secret has !computer but plays every computer unrated. Moron has
>!computer only at blitz, otherwise it is at the interesting hours
>only busy playing a thousand 3 0 games
>against a dual crafty.
>
>Despite its allowing all computers at all levels
>unrated (and rated at standard), to my big surprise not many apart
>from a few programmers/bookmakers match Moron with their program. The
>vaste majority of operators seemingly only kick on their dicks height,
>as they do usual find the quickest level that they can match DIEP at
>running under judgeturpin (allowing rated against everyone, no rating
>limits. 1100 rated sometimes fanatically play it a couple of games).
>
>>kind of competition directly, I expect that you can gather a massive amount of
>>data with free testers at will.  You can see how a change in Rebel performs
>>against top computers.  You can see how a change in Rebel performs against top
>
>don't expect a single 40 in 2 game though Ed in case you're interested...
>...icc doesn't allow m moves in t time levels.
>
>>humans.  I suggest you may write a parameter driven version of Rebel (or an
>>engine that can write personalities to disk based upon a set of criteria) and
>>then run one hundred games with the parameter at one setting, change the setting
>>and run another hundred.  Using this sort of technique, you can find out what
>>settings work best against various types of competition.  I think that will work
>>very much better than your contest, since the attempts at producing good
>>settings by others will be redundant and unscientific, for the most part.
>
>I completely disagree here. 100 blitz games is not gonna show much.
>apart from that you're dependant against who you play.
>



I disagree with your disagreement.  :)  Blitz games _are_ useful.  Because
they can, with a lot of work, highlight holes that have to be fixed.  IE the
most recent change to my eval, reported here a few weeks ago.  Roman watched
it play against several different GM players, and he noticed that once it got
to king and pawn endings, it greatly over-valued connected passers vs non-
connected passers.  And that 'hole' was quickly repaired, so that it hasn't
lost to that particular glitch again.  But this was found from blitz games.

I am careful about using blitz games, of course, as at the Paris WMCCC event
I had allowed the tuning to get grossly out of line.  It was holy hell at blitz
on ICC, but it played badly at longer time controls vs computers.  I tend to
watch all the standard games it plays carefully (Varguz generally plays at least
2 one hour + games every day, others are doing the same thing with commercial
programs).  But blitz games _can_ reveal weaknesses.  I have found passed pawn
problems, distant passed pawn problems, majority problems, and so forth.  By
going over lots of games quickly looking for that "pattern/trend" that is giving
it problems...



>
>>By using the net as a resource, you double your compute power.  By selling
>>copies of Rebel that can use the net as a resource you multiply your compute
>>power by the number of sales (e.g. you can gather a huge number of games from
>>the net and calculate strengths and weaknesses against rated opponents and you
>>don't even have to run them).
>>
>>Suggestion:
>>Have Rebel automatically annotate the network games with settings information so
>>that you can glean the effectiveness of various settings.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.