Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Knee jerk reaction!

Author: Sune Fischer

Date: 02:17:39 09/13/04

Go up one level in this thread


On September 12, 2004 at 20:49:30, Robert Hyatt wrote:

>>I won't go too much into this again, I think Sandro and I already have discusses
>>this :)
>>
>>> It
>>>won't predict how well it analyses.
>>
>>Why not?
>
>Because you force it into positions it won't normally play, against programs
>that also get forced into odd positions they won't normally play.  How will
>making decisions about that tell you anything about which is the best for
>analysis???

But Bob, I _want_ to force into positions it doesn't normally play, that is the
_idea_.

When you use a specially designed very narrow book it might never ever play d4,
but lots of people are interested in d4 openings and wants to see how the engine
does here.

In analysis the engine cannot pick and choose its own narrow set of test
positions, you can kick and scream all you want but it will _have_ to be good on
a wide range of very different type positions.

>
>No, giving it _the_ book customized for it will tell you how well it can play.
>Giving it an odd book most likely will weaken the thing.

I'm not interested in how it plays from an optimized, error free and very narrow
selection of opening positions.
I'm interested in how it plays on a wide bushy selection of openings that are
roughly equally.

Because I don't intend to use it in tournaments (only the author is allowed to
do that), my purpose is to use it for analysis!

>I don't see how you
>can use _games_ to predict how well a program can do in analysis...  The two are
>not related directly.  I've known plenty of strong players that couldn't explain
>a thing, and weaker players that could point out problems very clearly...

I use games because I don't believe it is possible to generate a representive
set of test positions.
By playing from lots of equal but complicated endgame positions you will be able
to tell who is the better endgame player.

It's true the time management and a few other game depend things might make this
different that pure analysis mode, but hey nothing is perfect.

>>No but it will reveal its strong and weak points which might be of interest to
>>the user.
>
>What user is qualified to figure that out?  When the programs are so much
>stronger than 99.9% of the humans that are trying to figure this out...
>
>IE bozos can't really decide which brain surgeon is the best...

Here is how this bozon would do it, he would test it, look at the result,
perhaps grind some statistics and draw his conclusions.
Standard procedure really. :)

>>You are the one who has been saying it makes _no_ sense to play without own
>>books, I'm just trying to show you that there are reasons to play without them.
>
>I haven't seen a single good example of how/why doing this provides useful
>information.

The reason is simply that you only find one single experiment to be interesting,
while I can think of more than 20 different ways to test engines that would
enable me to say more about their characteristics, their strengths and their
weaknesses.

Why you don't find these experiments interesting I have no idea.

>>My job is easy, I just need one single counter example :)
>
>No, you have it backward.  You are saying something is OK.  I have given more
>than one counter-example of why it is _not_ ok...  You can't give one example of
>why it _is_ ok and then conclude "it is ok."

I can certainly conclude that "it is ok" in that special case, the case we
happen to be talking about in fact.

>>>It is taking hokey positions, making programs play them against each
>>>other, and then trying to draw conclusions from that.  The two are _not_ the
>>>same thing.
>>>Ditto for learning on/off, pondering on/off, etc...
>>
>>I disagree.
>
>Then we just have to agree to disagree.  My experience leads me to one
>conclusion.  Based on writing several programs, playing in all sorts of
>competitions, etc...

Of course you can draw conclusions.

Say Crafty plays 10000 games against Fritz everything on, own books, learning
etc..

Fritz wins 70%-30%.

Now we disable books and use a selected wide variaty of opening positions
switched with side to move to make everything _equal_.

Fritz wins, but only 55%-45%.

Now we do it again, this time with learning diabled for both

Fritz wins, 83%-17%.

If you are telling me that you cannot conclude anything from that, then yes I
will certain claim full disagreement.

Moreover I find it interesting to study where each engine has its strong and
weak points.

>>
>>>>>No endgame tables?
>>>>
>>>>There is no room for endgame tables on his laptop.
>>>
>>>Baloney.  I have a sony VAIO with a 20 gig hard drive.  I have _all_ the 3-4-5
>>>piece files on it...  20 gig drives are small today.
>>
>>I have a 10 GB drive and it is full.
>
>You made that choice.  You _could_ get the 5 piece tables on there if you
>_wanted_ them.  That is the point...  It isn't a matter of "can't".  It is a
>matter of "don't want to".

Wrong. It's a matter of "I can't" because I need also a few other programs for
work related stuff, I could settle for the 3/4 man but really that disk is so
slow it's not even funny.

>>
>>To take another example, how are you going to use endgame tables on the
>>PocketPC?`
>>http://www.pocketgear.com/software_detail.asp?id=15142
>
>In 5 years the answer will be obvious.. :)

Nevertheless there is currently a reason to test without endgame tables.
I guess I can rest my case here :)

>>>So?  I do ponder=on matches on my single-cpu laptop all the time.  No problems
>>>at all
>>
>>How do you make sure they get 50% cpu each?
>
>I don't.  I trust the O/S to do that.  I just watch something like "top" to be
>sure it is correct most of the time.  If one chooses to not ponder for some
>reason, oh well...

I can think of many interesting experiments, but this experiment I would have to
call crap.

>>
>>What happens when one engine hits ETGB or runs a high priority thread?
>
>You aren't going to run a "high-priority" thread on a real O/S, unless you are
>running as a privileged user.  If so, that is so far beyond stupid as to not
>need any explanation.  Easy way to lose the whole system, so it shouldn't be
>done.  Of course you should not put your foot under a running lawn mower either.
> You can, but you shouldn't.

Anything to win.
If I know people will be testing with ponder on a single cpu machine, then there
is every reason to annoy my opponent with searching at high priority.

>>How do you measure progress without reproducability?
>
>If trying to find out which program is better, A or B, reproducibility is _not_
>an issue.

I said _progress_, not who is better.

> Do you _really_ think that if you play me as a human, that I am going
>to play the same moves every time you do?  Yet even in spite of that lack of
>reproducibility, you can't tell whether you are better than I am?
>Reproducibility is great for debugging.  Not necessary for strength
>measurements.

First of all I don't know why you keep comparing with humans, just because
reproducability is impossible for humans it doesn't have to be for machines.

>>
>>Say he wants to see how much changing the hash size means for Crafty - he can't
>>conclude anything due to the learning.
>
>
>So how are you going to get reproducibile results with crafty?  My book _always_
>has a randomness element in it.  The _search_ has a random element since it is
>based on processor timing info that can vary from one game to another by a few
>fractions of a second each move, which can have an impact on moves chosen by the
>search.

This reminds me of politics, "we can't stop polluting so let's not even try to
limit it".

I never could see the logic in that, but then again I'm not a politician.

>>
>>Say he changes some evaluation parameters and wants to see if Crafty plays
>>better - he can't conclude anything due to the learning.
>>
>
>
>Then he can't learn anything at all as there is no reproducibility in Crafty if
>the book is used.

Right right right, now you are getting it. :)

Disable the book!

> So use the best book with learning turned on to get the
>_best_ non-reproducible result.

Unacceptable.

>>
>>Not in testing analysis power.
>
>
>You haven't given one practical idea for finding out which engine is best for
>analysis.

Read above somewhere.

>I don't begin to buy "the engine that does the best on random
>position s" because that is _not_ true for humans.  And it isn't true for
>computers either.
>
>>For the long run development and to be strong in general analysis I think it is
>>interesting to investigate and improve the weak points also.
>>
>I wouldn't disagree.  But sometimes a strong point and a weak point are
>orthogonal to each other.  You can't do both well.  So you pick one to do well,
>and avoid the other.

Suppose two engines have the same tournament performance (Elo) when playing
_with_ books, but one engine has a lot of weak areas that the very narrow book
helps to avoid.

Now as it happens you don't need an engine for playing full matches but only for
analysis, then you'll be wanting the engine which engine has the fewest weak
points, agreed?

-S.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.