Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why SSDF list is the best

Author: Sandro Necchi

Date: 05:04:08 07/17/05

Go up one level in this thread


On July 17, 2005 at 07:04:21, Pallav Nawani wrote:

>On July 17, 2005 at 05:22:47, Sandro Necchi wrote:
>
>>I have been laughing a lot (maybe crying on the ignorance would have been more
>>appropriate?)reading many wrong statements about testing and Elo lists.
>
>I guess you know better than us, but your arguments hold no water.
>
>>so, for those who are new and do not know, SSDF list is the best for the
>>following reasons:
>>
>>1. They use 2 computers and the program complete with own book and ETG, with >own gui and best setting as suggested by the programmer.
>
>Depends on what you want to test. I you want to understand the strength of the
>_complete package_, then yes, using own Gui, using own Book with learning on is
>the best way.

Most of the users wants exactly this.

>However, testing the strength of the _engine_ in isolation also
>means using the openings where it might not play as well. This is also a valid
>way of testing, despite claims to the contrary.

Yes, if ones want to make a list of weaknesses and strenghts for engines.

>
>>2. They use long time controls (40/2h 20/1h; international level) only.
>
>Irrelevant.

I do not agree.

One example:

Shredder books have been made for long time controls. I mean the selection of
the moves has been made for that, so the book would be less good on blitz games.

This show your statement is false.

>For _rating_ (Mind you, _ratings_) Any time control is good enough
>as long as it is not so small that programs lose on time.

There is a way to avoid this...a good one.

CEGT time control is
>long enough, IMHO. What is important is having enough number of games.

Quality is import as well as quantity. It depends how the tests are made.

>SSDF do
>have a good number of games, of course, but just not enough to differentiate
>between two programs that are very close in strength.

If afer 1000 games 2 programs are very close, do you really think that after
10000 they will not be the same and or if there is a difference of 2-10 points
would that make a difference for a user?

>
>>3. They use the same hardware for all programs.
>I agree that this is a good thing, because this makes the matches more
>consistent. On the other hand, different hardware affects different programs
>differently, so (in theory) the result we have is only correct for SSDF harware.

Of course!

>For instance, how would crafty perform if they used 64 bit Opterons? However,
>the differences are usually small.

Well, if can afford 2 than you are lucky!
I believe SSDF will switch to that hardware soon or later anyway.
>
>>4. They use a very wide range of programs and not only the new ones to get more
>>reliable results.
>I think every rating list does that.

I was listing all the points which are important, not claiming that they are all
not included in other testing methods.

>
>>5. Ponder on and learning are activated.
>For rating purposes, ponder on is irrelevant, since pondering is effectively
>nothing more than giving more time to a engine.

I do not agree.
If one engine is better to guess the opponent reply can play better and reach
higher depths.
You are handicapping some engines by removing this option.

>For learning see the answer to
>your point 1.

Learning is what makes the computer more human like and not stupid (loosing
games exactly in the same way it did before).

>
>>So, anybody can test in a different way as they wish, but to claim that system
>>is better or replacing the SSDF system is pure nonsense!
>
>While I agree that SSDF gives us a good idea of the complete package, claiming
>that SSDF is better is also pure nonsense. For instance, Shredder 7.04 and
>Shredder 8 are very close in the SSDF list. Do you think that Shredder 7.04 and
>Shredder 8 have the same strength?

They are different and close to each other in strenght. I think the SSDF list is
correct in this case.
Also, again Shredder 7.04 is tested in UCI version while Shredder 8 in CB gui.
The UCI gui offers more performance and better learning so Shredder 8 UCI should
be about 25 points stronger than Shredder 8 CB. I am talking about the complete
package of course.

>Every rating list has its anomalies..

Yes, but I like to be very precise in all my statements...as you can see...

>
>Best Regards,
>Pallav

Best regards
Sandro




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.