Computer Chess Club Archives




Subject: Re: Proving something is better

Author: Vincent Diepeveen

Date: 16:36:09 12/17/02

Go up one level in this thread

On December 17, 2002 at 19:10:42, Dann Corbit wrote:

Dann, if your goal is to measure what works better for your program
in real life, then of course playing games against other
programs than your own thing proofs most.

But what is most important when you play another program?
Right: the quality of the move within the time you have.

Of course that testing is very time consuming.

So if you want to start proving, before doing the
game playing test, that it also is losslessly solving
more positions for example, then the only way to do
that is in a fixed time.

It isn't better then to solve 10 positions in 0:01 and 10 not
versus another algorithm solving 12 positions in 0:10 and 8 not.

What matters is whether you find it within the time control you play.

The human time control is like 3 minutes a move, but all the big mistakes
programs make are basically first move out of book. I had to my big
shame again 2 big examples of that last tournament i played (90 0 level)

I do not know about your program but in DIEP that means something like
between 5 and 15 minutes. 5 minutes at least at 90 0 and like 15 minutes
at 3 minutes a move like i play in paderborn ipccc2003 (40 in 2 + 1 all).

It is incorrect to compare number of nodes for an obvious reason.

suppose i want to show my parallel algorithm works fine. If i make
some crappy algorithm which basically takes care just 1 processor
searches then i can conclude my parallel algorithm is performing

The only decent way to measure my speedup is by looking at the
search times.

Of course the hard thing with parallellism is that some of the
researchers (and i do NOT mean Hyatt here) first slowed down
their program 30 times in order to then get a better speedup.

That's another issue which is hard to overcome.

Some zugzwang articles are very convincing there how to NOT do it,
just as bad is Cilkchess which i could see with my own eyes getting
5000 nodes a second at 1 cpu and a version not using cilk running
at a single processor getting far over 100k nps :)

Testing will never be easy as long as researchers want to present something
crappy as good, because only a crappy way of testing then can conclude
it is good.

Another method to show your thing works great is by not comparing with
a good other algorithm.

Suppose i want to conclude my algorithm works great, why not compare it
with minimax?

That's of course a great way to show my thing works good.

>On December 17, 2002 at 18:55:35, Bruce Moreland wrote:
>>On December 17, 2002 at 18:06:09, Dann Corbit wrote:
>>>Time to solution and number of solutions is the key for test problems (to my way
>>>of thinking).  If we count nodes, the slow searchers will look very bad.
>>>I think a bigger problem with test suites is that there is a plethora of
>>>excellent tactical test suites and a dirth of excellent positional test suites.
>>>So we measure tactical prowess effectively, but this does not necessarily
>>>translate into excellence in play.
>>Counting nodes means nothing if you are going to compare them between programs,
>>which Omid is not doing, so no knock on Omid there.
>Neither by me.  I think his paper was genuinely interesting.
>>With this kind of thing, what you are trying to do is get (presumably) the same
>>result in less time.
>>Less nodes is not *quite* the same as less time, even with the same program.
>>It's like measuring someone's height by measuring their shadow.  It is more
>>direct to measure their height, even though measuring their shadow will get you
>>Getting to the same depth in shorter time is also not quite the same, if you
>>don't get at least as many problems correct.
>>Neither is getting to the same depth in more time and then saying this is better
>>because you find more solutions.
>>You have to compare apples with apples or you've *proven* nothing, all you've
>>done is *implied* something.  That's not science.
>>This has nothing to do with "excellence" in play.  The whole idea is to take
>>what excellences is there and make it faster.
>>Is this technique faster?  The data doesn't say.
>>And this is a chronic problem when people write articles on search techniques.
>I think perhaps a good measure of ability would be to take a set such as WAC and
>normalize it with a good engine on a platform of known strength.  The time to
>complete would be (perhaps) 5 seconds per position, and the square root of the
>sum of the time squared would be used as a measure.
>Let's suppose that on a 1GHz machine, Crafty solves 297/300 and that the square
>root of the sum of the time squared was 300.  If two program solve an equal
>number of problems, then we use the time for a measure of goodness.  If not,
>then the number of solutions will be more important.
>Now, we will have a test that should be fairly reproducible.  Repeat this test
>procedure for a dozen or so test sets.
>After all, when playing chess, two things are important:
>1.  Getting the right answer.
>2.  Getting it fast.
>If other programs were tested under a similar setup, we might find some
>interesting results.  For instance, if one program averages 1/10 of a second to
>solve problems, even though it solves the same number, it would probably
>dominate over a program that takes 1 second on average to solve them.  Of
>course, it might not scale cleanly to longer time controls, but it seems nobody
>has the patience to test them like that.
>I suggest taking the square root of the sum of the squares to reduce the effect
>of sports that are abnormal either in quickness or slowness to solve.  Then the
>general ability will be more clearly seen.  A straight arithmetic average could
>easily be bent by outliers.

This page took 0.01 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.