Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: How to judge?

Author: Vincent Diepeveen
Date: 03:55:10 12/29/99
On December 28, 1999 at 20:12:16, Christophe Theron wrote:

>On December 27, 1999 at 19:10:48, Vincent Diepeveen wrote:
>
>>On December 27, 1999 at 18:20:27, Robert Hyatt wrote:
>>
>>>On December 27, 1999 at 16:01:24, Vincent Diepeveen wrote:
>>>
>>>>On December 27, 1999 at 15:38:11, Dann Corbit wrote:
>>>>
>>>>>On December 27, 1999 at 14:57:09, Ed Schröder wrote:
>>>>>[snip]
>>>>>How to know which is best?
>>>>>
>>>>>I think Dr. Hyatt's approach is a good one -- play a bazillion games on the net
>>>>>against quality opponents.  I see that Chris W. and Vincent D. have also
>>>>>followed this strategy.  Since the new improvements to Rebel also allow this
>>>>
>>>>Partly true and partly wrong.
>>>>
>>>>Dead wrong conclusion would be that i improve my program in order to blitz
>>>>better.
>>>>A few blitz game shows bugs in evaluation. however i feel blitz is
>>>>not relevant for my engine to measure it strength at these days.
>>>>The general problem of blitz games is that they go too fast.
>>>>I can't examine a 100 games each days!
>>>>
>>>>I feel standard rated is a lot more important. However Bob success is so
>>>>massive that there play hundreds of crafties out there. So rating
>>>>is so much dependant upon how diep scores against the current crafty version
>>>>that it's hard to sometimes draw conclusions. Some people running
>>>>diep at icc especially for this reason put !computer therefore.
>>>>
>>>>Secret has !computer but plays every computer unrated. Moron has
>>>>!computer only at blitz, otherwise it is at the interesting hours
>>>>only busy playing a thousand 3 0 games
>>>>against a dual crafty.
>>>>
>>>>Despite its allowing all computers at all levels
>>>>unrated (and rated at standard), to my big surprise not many apart
>>>>from a few programmers/bookmakers match Moron with their program. The
>>>>vaste majority of operators seemingly only kick on their dicks height,
>>>>as they do usual find the quickest level that they can match DIEP at
>>>>running under judgeturpin (allowing rated against everyone, no rating
>>>>limits. 1100 rated sometimes fanatically play it a couple of games).
>>>>
>>>>>kind of competition directly, I expect that you can gather a massive amount of
>>>>>data with free testers at will.  You can see how a change in Rebel performs
>>>>>against top computers.  You can see how a change in Rebel performs against top
>>>>
>>>>don't expect a single 40 in 2 game though Ed in case you're interested...
>>>>...icc doesn't allow m moves in t time levels.
>>>>
>>>>>humans.  I suggest you may write a parameter driven version of Rebel (or an
>>>>>engine that can write personalities to disk based upon a set of criteria) and
>>>>>then run one hundred games with the parameter at one setting, change the setting
>>>>>and run another hundred.  Using this sort of technique, you can find out what
>>>>>settings work best against various types of competition.  I think that will work
>>>>>very much better than your contest, since the attempts at producing good
>>>>>settings by others will be redundant and unscientific, for the most part.
>>>>
>>>>I completely disagree here. 100 blitz games is not gonna show much.
>>>>apart from that you're dependant against who you play.
>>>>
>>>
>>>
>>
>>bob you're saying exactly what i wrote above... ...in case of
>>gross eval blunders you see them of course, in case program is having
>>a bug which causes it to crash then it directly loses bunches of games...
>>...but other changes are pretty hard to judge.
>>
>>Like if i add some stupid and completely insane pruning then it'll
>>have at judgeturpin for sure a 100 points more at blitz.
>
>
>That's where the pleasure of reading your posts stands, exactly. Endless laughs
>at reading things like the sentence above.
>
>Keep on amusing us.

I know you like to laugh. What i stated above i can simply show
as a fact. See judgeturpins rating

Information about JudgeTurpin(C) (Last disconnected Sun Dec 26 1999 18:27):

          rating [need] win  loss  draw total   best
Wild        2026  [6]     3     3     0     6
Loser's     1946  [6]     0     1     0     1
Bullet      2499  [2]    97    59    21   177   2558 (27-Nov-1999)
Blitz       2492       3928  1027   525  5480   2730 (10-Sep-1998)
Standard    2372        889   406   198  1493   2523 (19-Sep-1999)

 1: Diep 2.0.something, PII 266 with 64megs of RAM
 8: This account is managed by SweenyTod

The highschore of judgeturpin was created with a version that
pruned as hell.

It's stable between 2450 and 2600 now. Though DIEP obviously improved,
judgeturpins ratings didn't improve. It did get close to 2523 recently
though.

>
>    Christophe
>
>
>
>
>>If you search 6 ply at blitz then you can rape search and still
>>do better...
>>
>>
>>>I disagree with your disagreement.  :)  Blitz games _are_ useful.  Because
>>>they can, with a lot of work, highlight holes that have to be fixed.  IE the
>>>most recent change to my eval, reported here a few weeks ago.  Roman watched
>>>it play against several different GM players, and he noticed that once it got
>>>to king and pawn endings, it greatly over-valued connected passers vs non-
>>>connected passers.  And that 'hole' was quickly repaired, so that it hasn't
>>>lost to that particular glitch again.  But this was found from blitz games.
>>>
>>>I am careful about using blitz games, of course, as at the Paris WMCCC event
>>>I had allowed the tuning to get grossly out of line.  It was holy hell at blitz
>>>on ICC, but it played badly at longer time controls vs computers.  I tend to
>>>watch all the standard games it plays carefully (Varguz generally plays at least
>>>2 one hour + games every day, others are doing the same thing with commercial
>>>programs).  But blitz games _can_ reveal weaknesses.  I have found passed pawn
>>>problems, distant passed pawn problems, majority problems, and so forth.  By
>>>going over lots of games quickly looking for that "pattern/trend" that is giving
>>>it problems...
>>>
>>>
>>>
>>>>
>>>>>By using the net as a resource, you double your compute power.  By selling
>>>>>copies of Rebel that can use the net as a resource you multiply your compute
>>>>>power by the number of sales (e.g. you can gather a huge number of games from
>>>>>the net and calculate strengths and weaknesses against rated opponents and you
>>>>>don't even have to run them).
>>>>>
>>>>>Suggestion:
>>>>>Have Rebel automatically annotate the network games with settings information so
>>>>>that you can glean the effectiveness of various settings.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.