Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments on SSDF by Mr.Diepeveen * The Two Computer Quest

Author: Rolf Tueschen

Date: 16:43:16 03/06/04

Go up one level in this thread


On March 06, 2004 at 14:38:38, Mridul Muralidharan wrote:

>On March 06, 2004 at 12:42:24, Uri Blass wrote:
>
>>On March 06, 2004 at 12:21:11, Mridul Muralidharan wrote:
>>
>>>On March 06, 2004 at 06:08:31, Rolf Tueschen wrote:
>>>
>>>>On March 05, 2004 at 19:44:56, Ed Schröder wrote:
>>>>
>>>>>On March 05, 2004 at 18:23:44, Vincent Diepeveen wrote:
>>>>>
>>>>>>On March 05, 2004 at 15:51:47, Thoralf Karlsson wrote:
>>>>>>
>>>>>>>On March 05, 2004 at 03:54:57, Afzal Siddique wrote:
>>>>>>>
>>>>>>>>Hello All,
>>>>>>>>
>>>>>>>>http://www.aceshardware.com/forum?read=105063596
>>>>>>>>
>>>>>>>>Afzal
>>>>>>>
>>>>>
>>>>>>>I have never asked Vincent Diepeveen for money in order to test his program.
>>>>>
>>>>>>That is correct. You told me that you were lacking hardware that much that
>>>>>>without another machine or 2 you would not be able to garantuee me that diep
>>>>>>would be soon at the list.
>>>>>
>>>>>So from that statement (we don't have enough PC's to test Diep) you concluded
>>>>>Thoralf was asking you to send him 2 PC's?
>>>>>
>>>>>Ed
>>>>
>>>>
>>>>Dear Ed,
>>>>
>>>>please stop playing these games and let me answer for VD. Yes, of course, a
>>>>young programmer MUST understand the words of TK this way! Period.
>>>>
>>>>VD wanted to reach the goal that his program was tested as soon as possible.
>>>>Understandable wish. Now the responsible man from SSDF says that he could only
>>>>do this if he had more resources - BUT of course he hasn't yet. OF COURSE this
>>>>does NOT say word for word that Vincent should send hardware as soon as
>>>>possible. But the implication is absolutely clear. Taken that VD WOULD have done
>>>>this, the SSDF certainly wouldn't have rejected the kind present.
>>>>
>>>>But all this is only part of the overall general problem!!!
>>>>
>>>>And we should thank VD that he has published this here in CCC. The other aspect
>>>>of the problem is that a company like ChessBase has more resources than just a
>>>>young programmer. THEY make an invoice with an autoplayer and whoopieee, the
>>>>SSDF is accepting it. Later it was found out that this autoplayer gave FRITZ an
>>>>edge. At that moment also Ed Schröder began to jump up and down. A kind of war
>>>>began.
>>>>
>>>>So here we come to the final aspect of this problem. Speaking in terms of
>>>>history. Overall, these parts of the "SSDF problem" could be defined as follows:
>>>>
>>>>   *** the SSDF is held by amateurs, by certainly sympathetic hobby freaks
>>>>
>>>>   *** due to a lack of resources SSDF had to test by hand in the early days
>>>>
>>>>   *** due to that same aspect SSDF became open for manipulative tools
>>>>
>>>>   *** in consequence gifts of hardware alone _could_ influence the results
>>>>
>>>>This is all, what Vincent is saying and this is correct. If the SSDF were really
>>>>independent, they would test completely without contacts with the programmers.
>>>>They would buy the progs in shops and they would test them. They would test in
>>>>the spirit of the potential clients, the end-users. The whole communicating with
>>>>the programmers and their companies makes the SSDF object of almost invisible
>>>>manipulations.
>>>>
>>>>Also herefore Vincent gave perfect examples. The invoice of special "books" and
>>>>"learning files" is obviously a tool to manipulate the results of the tests
>>>>because the programmers want to react themselves on the reactions of the other
>>>>collegues with newer program versions on _their_ progs. Obviously this has no
>>>>longer something to do with independent and reliable testing standards.
>>>>
>>>>To make this absolutely clear: a test, once begun, does NOT allow a tester to
>>>>later make all kind of replacements or manipulative novelties because that
>>>>simply and perfectly destroys the validity of the tests! (Just to tell the truth
>>>>to many testers here around: you should NOT update your progs in a test
>>>>"tournament" because that makes the whole tournament invalid.
>>>
>>><snip>
>>>
>>>
>>>Hi Rolf,
>>>
>>>  Very nicely put.
>>>I would like to point out another thing - Vincent mentioned that he was asked
>>>for Hardware : some 6 years or so back - not yesterday or last year !
>>
>>No
>>
>>Vincent admitted that the ssdf did not ask him for hardware but only said that
>>they have not enough hardware to test it immediately.
>>
>>>
>>>Ed also keeps mentioning - "not anymore" - does this mean that there
>>>may/defintely(!?) have been problems earlier on in the past ?
>>>Could this h/w request also be a thing of that same past period ? :)
>>
>>It was not about hardware request but about the Fritz autoplayer that was not
>>public at that time.
>>
>>Things were changed since that time.
>>
>>Uri
>
>I dont know why I am bothering to reply to you , but since everyone should be
>given a fair chance to understand , here goes a last ditch effort.
>
>(All refernces to "you" below are to a third person/neutral observer).
>
>1) You come to me and say - "can you test my program" , I say "No because I dont
>have any machines free now , If only I had ...."
>
>Yes this is not a straight forward request for machines - but in real world -
>esp among businessmen (though not necessarily only among them! ) - it could mean
>only "I need more machines , if you want me to start testing now , please help
>me by helping me with machines so that I can help you !".
>
>note : could be still be considered as a help / loan / whatever.
>
>Nobody is corrupt or wants to be in this case - no machines free for testing ,
>so physical impossibility to test.
>
>Now if I have lot of resources and money , and say I am the CEO of a
>hypothetical company chessX then I could say "Hey , my new program unnamedX
>needs to be on the list immediately since it will get released soon/was just
>released. So here are 10 dual boxes for your testing" - this can be considered
>as pure help/loan.
>
>But in real world ,when I do this , I earn a "favour" as Mario Puzo would have
>put it :) Which I may use in future at an appropriate time.
>
>(at some future date) I could say :
>"I helped you in past when you had no machines ,so :
>(the above would be implict statement - I ned not mention it explictly !)
>
>a) please insert this program which just got released - lot of revenue is
>dependent on it - I just want your honest test opinion. (another matter that my
>program will have a book to kill all opponent books in the testing forum).
>
>b) Please release list a few days/weeks before normal date" , etc , etc
>
>Now if you are going to say that I sound like a conspiracy theorist , please
>look at SSDF history and things will fall into place , people who know it will
>understand what I am saying here ...
>
>So the independence of the testing entity is already lost ! - without even them
>attempting anything wrong or having any evil/corrupt intentions.
>
>2) I introduce a dirty bug - (intentional/unintentional debate could last
>forever !) now surprisingly , this hurts me very very low while hurts opponent
>bigtime (already suspicious). I dont publish possibility of this bug or
>proactively fix this (suspicion ++).
>When people find this , suppose there is major hue and cry - "stupid bug
>wreaking games" - "results are skewed 'cos of this" , "automated internet based
>tournaments are the way to go" ( ;-) ) , etc , etc.
>
>What do I do ?
>
>If I am evil CEO , then :
>Fix this obvious bug , but introduce more subtle bugs - which are not so easy to
>spot. (uci 1mb ?!)
>
>Now , you will say again that this is also another subtle bug which was not
>intentional.
>Then we have the process priority issue - this can be debated to death that it
>is not existing (on a dual box with pondering on in a two engine match - I have
>personally seen this happen).
>
>Or you could say that in latest version this is fixed and was a bad bug in
>previous release.
>
>But the history of this product will make me suspect more "bugs" in this new
>version too !
>ones which will take more time and testing to debug and ones which will be
>tougher to identify and track.
>
>It is like the normal problem with chess prog clones - initially they were
>stupid enough to only change the program/version strings in code , now they are
>more sophisticated - people flick bulk code snippets , becomes tougher and
>tougher to identify and track a clone.
>
>Similarly , it will become tougher and tougher to find these "bugs" in the
>interface.
>GCP was kind enough to make the 1 MB bug publically known , though some people
>knew this privately for sometime - including me (when i wanted to add support to
>uci to earlier mess versions) , and I had informed some of my friends already
>about this !
>So just 'cos you do not hear of the new "bugs" need to mean that others dont
>know or that they dont exist !
>
>This means , that the product is already suspect !
>
>All this can be said to be old history , etc , etc , etc but as time progresses
>, more and more new ways will comes up - and maybe 5 years later Vincent or
>someone else will again post on how some interfaces cheat using some xxxxx
>inconsistency in the yyyy and due to which how some programs zzz1 , zzz2 , etc
>from compnay qX were benifiting for past 8 years.
>Again we will debate and then in end conclude - all this is history and forget
>and forgive !
>
>Fact does not change that throughout history of the list , you have only
>controversies and that at no point of time you can take the results to mean
>anything substantial.
>
>Even if we are willing to overlook the fact that no games are posted/seldom
>posted and assume that everythig is properly tested and there are no evil
>intentions in SSDF maintainers/testers/managers/etc , the list already becomes
>suspect !
>
>SSDF is a great list and no doubt gives a good idea about the approximate
>strength of programs - but among top 25 - I would not say that any one of them
>is better than the other based on ssdf results even if you have like 120 elo
>diff due to the way things are going.
>When the relative position of the commercials in the ssdf gets used for their
>commercial purposes , then we have commercial interests coming in direct anyway
>!
>
>Time to end my rambling ! :)
>
>Mridul


You are a deep thinker! I support your conclusions 100%. Also you put Vincent's
contribution into the right context. While Ed is the conspicious man who
believes in "paranoia" when commercial interests are very concrete and real.

Mridul, I almost can't spell your complicated prename, excuse me, the best part
of your message is the clarification that the validity of the testing procedure
is already lost if such a possibility to take influence does exist. You can only
lose your virgin status _once_ what validity and other standards are concerned.
However small the influence might be, however old the influence might be, once
the virginity is hurt, you are no longer in a neutral state of testing.

It is thankful that you also stated that the singular SSDF testers are innocent.
They became victims, completely innocent, of a too strong and naive belief in
their own chess games - while the formal details of the procedures became the
real and decisive factors of the resulting ranking list. In fact the "Swedes"
became victims of their proverbial friendliness to all sides and in the end
delivered the initiative to the companies, while still believing that they had
the say. For scientists, and you must be one yourself, it is trivial why the
SSDF couldn't make independent tests. Certainly NOT because they intended to
cheat! But because of their proverbial neutral friendliness to all sides where
strict non-speaking terms should have been guideline number one! But how to
explain to a lay that he is NOT doing correct tests? - Also Vincent is by no
means a scientist himself as we could see more than 102 times in debates with
Bob (Hyatt). But here Vincent is absolutely right in his critic.

There is only one thing that makes me wonder. If Vincent were really that
critical towards weak methods and naivety, why could he behave like he did in
the Graz scandals? It's impossible for a Dutch Fide master to overlook that a
player has no right to throw a drawn game!!! So, it is this arbitrariness that
makes me a bit nervous when I must judge the critical potential of VD. But
again, he's not to blame since he's not a scientist.

With a very collegial hug I leave you for a happy Sunday, you made my day,

Rolf



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.