Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Qualifier to my previous statement

Author: Dann Corbit

Date: 14:33:36 01/29/99

Go up one level in this thread


On January 29, 1999 at 17:13:04, KarinsDad wrote:
[snip]
>Sorry Dann, but you do not make sense. My statement was "if you want to compare
>engine strength versus engine strength to determine if the program got better
>and not compare whether some database has better data in it, then you should use
>the same databases" and you respond with "What you are testing here is engine
>strength verses engine strength". Of course it is. That's what I said.
Hmmm...  I sort of missed that part.  If we want to measure a bare engine's
compute power that is what we will do.  If we want to find a program's strength,
then that is another question.

>Yes, I agree. There is more to a program than just it's engine. But what I am
>saying is that you should compare "apples to apples" and have only one variable
>different per test. You could test CM5000 and CM6000 with the same opening books
>and same tablebases (if CM uses tablebases) for a thousand games. You could then
>test CM6000 using CM5000 opening book vs. CM6000 using CM6000 opening book for a
>thousand games to see if the opening book improved.
>
>As long as you only change one variable from the control per test, then you are
>measuring something real. You can find out if CM6000 is stronger due to a better
>opening book or a stronger engine or both.
Depends on how we define the variables.  "CM5000 out of the box with whatever
comes with it" verses "CM6000 out of the box with whatever comes with it" is one
possible test, for example.  With something like that we will have a simple
system where we can even use CM6000 v CM6000 and CM5000 v CM5000 as additional
experiment controls.

>There are other factors to consider. What if CM6000 changed C compilers to
>better work on Pentium IIs, but then works worse on Pentiums. Does that mean
>that CM6000 is better or worse? Depends on what type of hardware you run it on.
Could not agree more.  An experiment is only valid if it is repeatable (can
anyone say "cold fusion"?), so we should specify the conditions of the
experiment as precisely as possible.

>>>If I could take a 1200 rated playing program and give it an opening database of
>>>all moves out to 100 moves for each side (a very large database on the magnitude
>>>of 10^320 positions) and a tablebase which can handle all positions with 12
>>>pieces on a side (another extremely large database which I cannot even guess how
>>>to calculate), then I would have a program that would never lose to Deep Blue
>>>since it would never use it's search engine for anything other than looking up
>>>data out of databases.
>>Well, if you could produce such a remarkable database system, then your program
>>would have that ability.  It does not matter where the answers come from.
>>Should we get annoyed because the computer did not have to think about it but
>>instead did a simple lookup?  A chess program is a black box.  Into it go board
>>positions and out of it come board positions.  How it generated the positions is
>>not relevant in determining the strength of the program.  If we remove all
>>database entries, then we measure only engine strength.
>>
>
>Well, sort of. What if I play 2000 French games against a computer and 2000
>King's Indian games against a computer and I win all of the French's and it wins
>all of the King's Indians. Who is stronger? Are we the same strength? What if we
>both win all games playing as white? Who is stronger?
Nobody knows, but you do have some interesting data points.  The data points are
valid for those exact specifications and are not useful for extrapoltion to
other exploits, considering the variability described.

>The problem with ratings in general is that they make an approximation based on
>results, but do not break those results down into categories (except time).
>
>This is the reason GMs prepare for tournaments. They try to get their opponents
>into disadvantageous territory. They do not care about the ratings of their
>opponents, they care about their strengths and weaknesses. Hence, the propensity
>for learning programs. They do not learn chess knowledge, they learn which
>variations within an opening book lead to wins and loses for their particular
>engine (again it is the engine which is important).
I agree with your observation that experiments are often poorly controlled or
under specified (or even cooked, for that matter).  But even having identical
books for programs a and b is not a level playing field if they do not use them
identically.  Suppose that a knows how to use the book when black or white but b
only knows how to use it when black.  Or suppose that a can jump to an alternate
line when something changes but b can only run along a pre-determined course
until something it does not expect happens.  Suppose further that b has a
special ability to recognize when it falls *back into* the book and a does not.
In other words, identical books does not mean identical play from two programs.
But the books will change the ability of the engine and disregarding that effect
can have a tremendous impact.  In fact, it could be the difference between a
2000 ELO engine and a 2500 ELO engine.  The same is true for tablebase files.
Having identical tablebase files will not guarantee equal play.  Suppose that
engine c knows how to reduce a situation that is not in the tablebase into one
that is in the tablebase by forced exchanges or whatever and engine d does not.
Or suppose that engine e can use a tablebase on half of the board by using
symmetry.  Or suppose that engine f can reverse the roles of black and white in
the tablebase files and get twice as many positions out of it and engine g
cannot...
It should be obvious that the opening database and the endgame database are an
integral part of the ability of the programs to play the game of chess.
Otherwise, what we are talking about _really_ is just a problem solving program.
 These programs are especially sharp at finding very nearby mates or tactical
coups, but are completely unable to play a game of chess.  If we throw out the
opening books and endgame tablebase files, we are *not* measuring what the
programs can do, since they are an integral part of the program design.

>>>Would CM6000 be stronger than CM5000 with a stronger opening database? Most
>>>likely. Is it a fair test to compare CM5000 with CM6000 with them both using the
>>>same opening database? Of course. That's the point. If CM6000 has an inferior
>>>engine to CM5000, but had a much more superior opening book, it could still win
>>>games due to being in a superior position out of the opening.
>>No more fair or less fair than testing with different database or endgame
>>tablebase systems or whatever.  You should describe in the test the full nature
>>of the variables of the experiment, but if you want to find out how well a
>>program plays chess, you do not remove the data it normally has at its disposal.
>> It will play far worse than it is capable of.
>>
>
>Agreed. The question I am trying to answer is: Have they gotten better? You are
>looking at whether they have gotten better overall. Fair enough. I am looking at
>whether the engines (and hence the algorithms) are getting better, or if it is
>merely a matter of better databases. The best way to answer that is to not just
>run CM5000 versus CM6000, but rather to segregate for each component.
I think that the algorithms extend to the opening books and tablebase files.

>>>The difference between humans and programs is that the opening book of a human
>>>is an integral part of him whereas this is not the case with a program. A
>>>program can use any opening book (in the appropriate format) or none at all. You
>>>cannot compare the two.
>>A human's opening book also changes over time.  You can learn new openings or
>>you can forget how to use an opening you have not used for a while.  You can
>>also have holes in your opening book just like a computer.
>>Furthermore, *my* opening book is a microscopic fraction of the opening book of
>>a GM. Is it fair to have us play against each other when his opening book is
>>much larger?  Of course it is.  If I want a bigger {internal} opening book, I
>>should study more.  And if I simply lack the capability to gather an opening
>>book the size of Sierwan or Karpov or whomever, then tough -- I just have to get
>>along with what I am capable of mustering.
>
>You know, most of these discussions are debates in semantics. People here tend
>to compare computers and humans similarly when they want to and then turn around
>and compare them differently at other times.
>
>The two are really quite dissimilar machines playing the same game.
>
>For example: computers do not need "insufficient losing chances" types of rules.
>These rules were added for humans with human feelings. Either you win within
>time, or you draw, or you lose.
>
>A delayed clock is designed for a human, not a computer.
>
>Computers do not need rules on writing down the moves, they can do it
>(effectively) effortlessly.
>
>Computers make logs. Humans making logs during a game is considered cheating.
>
>Computers look up exact moves in a database. The choices in a given opening book
>will not change (assuming no learning and no intervention). Humans can play the
>same opening for 20 years and suddenly make a blunder in the opening.
>
>Humans get tired and sick. Computers do not.
>
>Humans will (most often) stop playing if a hurricane threatens. Computers will
>not.
>
>There are many differences between the two. Whenever you make an analogy for one
>based on the other, there is a tendency for it to be semantics and nothing more.
Well, as a wise man once said, "They're exactly the same, only completely
different."




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.