Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Chess Tiger 15 vs Fritz7

Author: Uri Blass

Date: 10:28:30 06/11/02

Go up one level in this thread


On June 11, 2002 at 12:36:41, Christophe Theron wrote:

>On June 11, 2002 at 01:42:10, Uri Blass wrote:
>
>>On June 10, 2002 at 22:29:44, Christophe Theron wrote:
>>
>>>On June 10, 2002 at 15:28:14, Rajen Gupta wrote:
>>>
>>>>i have read somewhere (i think it was hinted in one of the interviews which
>>>>frank morsch gave to one of the indian newspapers)that at any given time, there
>>>>are several different versions of fritz being developed:- the inference being
>>>>that and the one that is actually released is not necessarily the strongest one;
>>>>its the one that is just strong enough.
>>>>
>>>>frank morsch apparently has one ready whenever a new upstart arrives on the
>>>>scene.i wont be surprised if there is no new fritz till something overtakes the
>>>>current version.
>>>>
>>>>rajen
>>>
>>>
>>>
>>>It does not make sense.
>>>
>>>Look at the small margin between Fritz and the program just behind it (Tiger) on
>>>the SSDF.
>>>
>>>Why would Frans take the risk of publishing an engine that might fail to achieve
>>>the first place on the SSDF if he has something better?
>>
>>Maybe he does not know which engine is the best.
>>
>>The only way to be sure that engine A is better than engine B is by games.
>>You can always have other tests in order to guess but they are only an estimate.
>>
>>I know that you say that you do not use games against other opponents but I
>>think that it is a mistake.
>>
>>The fact that you probably have some test that usually gives
>>the same results as games is a good reason to use that test for testing one
>>change but when you decide to release a new version the only way to be sure that
>>it is better is by a lot of games(unless the change is only doing tiger faster).
>>
>>Uri
>
>
>
>In order to have a top chess program you must have a method to decide if a
>change is an improvement or not. One of the requirements of this method is that
>you must be able to get a result in a short period of time (preferably less than
>4 days in the most difficult cases).
>
>There are many little changes to test before you get a version significantly
>stronger than your last release.

I believe that it is also possible that one change can give a significant
improvement.

I believe that I have ideas to do every program more than 30 elo better by some
change in the search rules but it is not trivial to write the relevant code in
the computer and today I am more interested in the question what I can earn from
simple changes in the evaluation(I mean simple to write the relevant code and it
does not mean simple to think about).

I believe that there is a lot of room for improvement from both changes in the
search rules and changes in the evaluation.

I agree that you cannot test every small change by games and it is important to
have a test in order to get an estimate if the change is good or bad without
games but you can test a set of big number of changes by games because you
cannot be sure that your estimate is right.

I believe that there are changes that are productive for long time control and
not productive for short time control and the opposite(for example I do not
believe that the best evaluation weights for blitz are also the best evaluation
weights for long time control).

>
>It is not practical to let people test several versions and decide for you
>because you can't rely on results you have not controlled yourself (there are
>too many possibilities of inconsistencies even in the experiments you set up
>yourself) and because these people would have to play a lot of games under
>equivalent conditions in order to get statistical relevance (which you seldomly
>get, because you cannot ask people to play 500 games in a row).

You do not need to ask people to play 500 games in a row if you have enough
testers.

You can use some testers who play less games instead of one tester who plays 500
games in a row.

>
>I cannot believe that a serious chess programmer would use such a lousy
>selection method.

I do not know but if the programmer works in 2 different directions then it is
possible that there are 2 versions that are clearly different and in both
versions there are many changes.
>
>Testers feedback is very valuable to spot problems or lacks in the program's
>knowledge, bugs, and more generally good advices on general directions to work
>on.
>
>Testers feedback is used to get quality data, human advice and creativity, you
>generally cannot use it to get a quantity of statistically relevant data.

The problem that I found in games of plam tiger by Jorge Pichard is the fact
that the computer was slowed down in part of the games by a significant factor
but it is possible to find this problem by looking at the games even without
analyzing them because the games included the depthes that the programs got.

I believe that it is possible to find statistically relevant data if you have
testers that give you the relevant information that is not only the games but
also the depthes that the programs got and the number of nodes that they
searched.

Machine of different testers may be different but if you use the same testers to
test more than one version for comparison then I see no reason that you cannot
get statistically relevant data.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.