Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Knee jerk reaction!

Author: Robert Hyatt
Date: 07:34:55 09/13/04
On September 13, 2004 at 05:17:39, Sune Fischer wrote:

>On September 12, 2004 at 20:49:30, Robert Hyatt wrote:
>
>>>I won't go too much into this again, I think Sandro and I already have discusses
>>>this :)
>>>
>>>> It
>>>>won't predict how well it analyses.
>>>
>>>Why not?
>>
>>Because you force it into positions it won't normally play, against programs
>>that also get forced into odd positions they won't normally play.  How will
>>making decisions about that tell you anything about which is the best for
>>analysis???
>
>But Bob, I _want_ to force into positions it doesn't normally play, that is the
>_idea_.

That is the _flawed_ idea.  Because the _opponent_ is forced into similar
positions.  Suppose the opponent does worse.  Does that mean this program is
better?  Not if you play them normally.

Put two programs into positions their authors didn't intend, and the results
won't mean much, which was my point...

If you think a program should be able to play all positions well, I agree so
long as "should" is included.  But replace it by "does" and it becomes false.

>
>When you use a specially designed very narrow book it might never ever play d4,
>but lots of people are interested in d4 openings and wants to see how the engine
>does here.


How about convincing Korchnoi to not play 1. d4 in an important game?


>
>In analysis the engine cannot pick and choose its own narrow set of test
>positions, you can kick and scream all you want but it will _have_ to be good on
>a wide range of very different type positions.

No it doesn't, and that is the flaw in your assumption.  You might _want_ it to
be good in a wide range of positions, but that won't make it so, for any program
around.


>
>>
>>No, giving it _the_ book customized for it will tell you how well it can play.
>>Giving it an odd book most likely will weaken the thing.
>
>I'm not interested in how it plays from an optimized, error free and very narrow
>selection of opening positions.
>I'm interested in how it plays on a wide bushy selection of openings that are
>roughly equally.



That is an impossible condition (roughly equal).  What is "equal" depends on the
program.


>
>Because I don't intend to use it in tournaments (only the author is allowed to
>do that), my purpose is to use it for analysis!

Again, to the man who has a hammer, _everything_ looks like a nail.  Chess
programs are not particularly good "general solutions" to the chess problem...


>
>>I don't see how you
>>can use _games_ to predict how well a program can do in analysis...  The two are
>>not related directly.  I've known plenty of strong players that couldn't explain
>>a thing, and weaker players that could point out problems very clearly...
>
>I use games because I don't believe it is possible to generate a representive
>set of test positions.

If you can't produce a set of positions, then how is it possible to do the same
by choosing random openings instead???


>By playing from lots of equal but complicated endgame positions you will be able
>to tell who is the better endgame player.

How, if one is tactically weaker but much stronger in endgames?  You won't be
reaching many endgames...

>
>It's true the time management and a few other game depend things might make this
>different that pure analysis mode, but hey nothing is perfect.
>
>>>No but it will reveal its strong and weak points which might be of interest to
>>>the user.
>>
>>What user is qualified to figure that out?  When the programs are so much
>>stronger than 99.9% of the humans that are trying to figure this out...
>>
>>IE bozos can't really decide which brain surgeon is the best...
>
>Here is how this bozon would do it, he would test it, look at the result,
>perhaps grind some statistics and draw his conclusions.
>Standard procedure really. :)

So you count 1-0, 0-1 and 1/2-1/2???

Not always the best system when trying to see which program is better
positionally...


>
>>>You are the one who has been saying it makes _no_ sense to play without own
>>>books, I'm just trying to show you that there are reasons to play without them.
>>
>>I haven't seen a single good example of how/why doing this provides useful
>>information.
>
>The reason is simply that you only find one single experiment to be interesting,
>while I can think of more than 20 different ways to test engines that would
>enable me to say more about their characteristics, their strengths and their
>weaknesses.
>
>Why you don't find these experiments interesting I have no idea.

There is a difference between _ME_ doing experiments to figure out what to do to
make my program better, and an end-user doing such experiments and simply
drawing conclusions from them.  I tailor experiments to explore specific things.
 I don't just run random tests to see what happens...


>
>>>My job is easy, I just need one single counter example :)
>>
>>No, you have it backward.  You are saying something is OK.  I have given more
>>than one counter-example of why it is _not_ ok...  You can't give one example of
>>why it _is_ ok and then conclude "it is ok."
>
>I can certainly conclude that "it is ok" in that special case, the case we
>happen to be talking about in fact.

But if it isn't ok in _all_ circumstances, it is flawed.




>
>>>>It is taking hokey positions, making programs play them against each
>>>>other, and then trying to draw conclusions from that.  The two are _not_ the
>>>>same thing.
>>>>Ditto for learning on/off, pondering on/off, etc...
>>>
>>>I disagree.
>>
>>Then we just have to agree to disagree.  My experience leads me to one
>>conclusion.  Based on writing several programs, playing in all sorts of
>>competitions, etc...
>
>Of course you can draw conclusions.
>
>Say Crafty plays 10000 games against Fritz everything on, own books, learning
>etc..
>
>Fritz wins 70%-30%.
>
>Now we disable books and use a selected wide variaty of opening positions
>switched with side to move to make everything _equal_.
>
>Fritz wins, but only 55%-45%.
>
>Now we do it again, this time with learning diabled for both
>
>Fritz wins, 83%-17%.
>
>If you are telling me that you cannot conclude anything from that, then yes I
>will certain claim full disagreement.

Then we disagree.  Was the book bad and learning helped?  Was the book good but
bad luck with randomness hurt?  Was the program bad but got good openings?  Was
the program bad and got bad openings?  I don't see how to conclude anything from
the results.  If I look at the games, I would learn far more.

>
>Moreover I find it interesting to study where each engine has its strong and
>weak points.
>
>>>
>>>>>>No endgame tables?
>>>>>
>>>>>There is no room for endgame tables on his laptop.
>>>>
>>>>Baloney.  I have a sony VAIO with a 20 gig hard drive.  I have _all_ the 3-4-5
>>>>piece files on it...  20 gig drives are small today.
>>>
>>>I have a 10 GB drive and it is full.
>>
>>You made that choice.  You _could_ get the 5 piece tables on there if you
>>_wanted_ them.  That is the point...  It isn't a matter of "can't".  It is a
>>matter of "don't want to".
>
>Wrong. It's a matter of "I can't" because I need also a few other programs for
>work related stuff, I could settle for the 3/4 man but really that disk is so
>slow it's not even funny.
>
>>>
>>>To take another example, how are you going to use endgame tables on the
>>>PocketPC?`
>>>http://www.pocketgear.com/software_detail.asp?id=15142
>>
>>In 5 years the answer will be obvious.. :)
>
>Nevertheless there is currently a reason to test without endgame tables.
>I guess I can rest my case here :)

Care to guess how many pocket-users there are compared to normal users?


>
>>>>So?  I do ponder=on matches on my single-cpu laptop all the time.  No problems
>>>>at all
>>>
>>>How do you make sure they get 50% cpu each?
>>
>>I don't.  I trust the O/S to do that.  I just watch something like "top" to be
>>sure it is correct most of the time.  If one chooses to not ponder for some
>>reason, oh well...
>
>I can think of many interesting experiments, but this experiment I would have to
>call crap.

Ponder = on, one cpu?  Hardly crap at all.  Works like a charm.

>
>>>
>>>What happens when one engine hits ETGB or runs a high priority thread?
>>
>>You aren't going to run a "high-priority" thread on a real O/S, unless you are
>>running as a privileged user.  If so, that is so far beyond stupid as to not
>>need any explanation.  Easy way to lose the whole system, so it shouldn't be
>>done.  Of course you should not put your foot under a running lawn mower either.
>> You can, but you shouldn't.
>
>Anything to win.
>If I know people will be testing with ponder on a single cpu machine, then there
>is every reason to annoy my opponent with searching at high priority.

I'll play you a match on my linux box.  Feel free to start a high priority
thread, but you won't be running as root and might find it difficult...



>
>>>How do you measure progress without reproducability?
>>
>>If trying to find out which program is better, A or B, reproducibility is _not_
>>an issue.
>
>I said _progress_, not who is better.
>
>> Do you _really_ think that if you play me as a human, that I am going
>>to play the same moves every time you do?  Yet even in spite of that lack of
>>reproducibility, you can't tell whether you are better than I am?
>>Reproducibility is great for debugging.  Not necessary for strength
>>measurements.
>
>First of all I don't know why you keep comparing with humans, just because
>reproducability is impossible for humans it doesn't have to be for machines.

hmmm...  that is _my_ goal, in fact...



>
>>>
>>>Say he wants to see how much changing the hash size means for Crafty - he can't
>>>conclude anything due to the learning.
>>
>>
>>So how are you going to get reproducibile results with crafty?  My book _always_
>>has a randomness element in it.  The _search_ has a random element since it is
>>based on processor timing info that can vary from one game to another by a few
>>fractions of a second each move, which can have an impact on moves chosen by the
>>search.
>
>This reminds me of politics, "we can't stop polluting so let's not even try to
>limit it".
>
>I never could see the logic in that, but then again I'm not a politician.
>
>>>
>>>Say he changes some evaluation parameters and wants to see if Crafty plays
>>>better - he can't conclude anything due to the learning.
>>>
>>
>>
>>Then he can't learn anything at all as there is no reproducibility in Crafty if
>>the book is used.
>
>Right right right, now you are getting it. :)
>
>Disable the book!

Then he _still_ can't learn anything because now we reach positions where Crafty
would not normally reach.  Or if you start from move 1 with no book, you just
get the same game over and over which might not be so easy to understand...


>
>> So use the best book with learning turned on to get the
>>_best_ non-reproducible result.
>
>Unacceptable.
>
>>>
>>>Not in testing analysis power.
>>
>>
>>You haven't given one practical idea for finding out which engine is best for
>>analysis.
>
>Read above somewhere.
>
>>I don't begin to buy "the engine that does the best on random
>>position s" because that is _not_ true for humans.  And it isn't true for
>>computers either.
>>
>>>For the long run development and to be strong in general analysis I think it is
>>>interesting to investigate and improve the weak points also.
>>>
>>I wouldn't disagree.  But sometimes a strong point and a weak point are
>>orthogonal to each other.  You can't do both well.  So you pick one to do well,
>>and avoid the other.
>
>Suppose two engines have the same tournament performance (Elo) when playing
>_with_ books, but one engine has a lot of weak areas that the very narrow book
>helps to avoid.
>
>Now as it happens you don't need an engine for playing full matches but only for
>analysis, then you'll be wanting the engine which engine has the fewest weak
>points, agreed?

If you could find such a thing, yes.  I just don't believe the above
circumstance exists in an easy to find way...


>
>-S.
Re: Knee jerk reaction! Sune Fischer 11:18:01 09/13/04
- Re: Knee jerk reaction! Sandro Necchi 12:37:04 09/13/04
  - Re: Knee jerk reaction! Sune Fischer 13:17:22 09/13/04
    - Re: Knee jerk reaction! Sandro Necchi 13:38:56 09/13/04
      - Re: Knee jerk reaction! Sune Fischer 14:52:33 09/13/04
        
        Re: Knee jerk reaction! Sandro Necchi 07:00:54 09/14/04
        
        Re: Knee jerk reaction! Sune Fischer 12:51:24 09/14/04
        
        Re: Knee jerk reaction! Sandro Necchi 13:31:28 09/14/04
        
        Re: Knee jerk reaction! Martin Slowik 01:58:30 09/14/04
        
        Re: Knee jerk reaction! Sune Fischer 12:30:58 09/14/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.