Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: Rolf Tueschen

Date: 06:05:31 02/17/03

Go up one level in this thread


On February 17, 2003 at 06:53:14, Uri Blass wrote:

>On February 17, 2003 at 06:29:23, Rolf Tueschen wrote:
>
>>On February 16, 2003 at 13:21:39, Tony Hedlund wrote:
>>
>>>On February 15, 2003 at 07:12:10, Rolf Tueschen wrote:
>>>
>>>>On February 15, 2003 at 05:24:43, Tony Hedlund wrote:
>>>>
>>>>>On February 14, 2003 at 16:27:31, Rolf Tueschen wrote:
>>>>>
>>>>>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>>>>>
>>>>>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>>>>>
>>>>>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>>>>>for errors in interpretation.
>>>>>>>>>
>>>>>>>>>Bob D.
>>>>>>>>
>>>>>>>>
>>>>>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>>>>>should, but must, if the differences in the actual results are way smaller than
>>>>>>>>the error in the tests itself. Is that impossible to understand?
>>>>>>>>
>>>>>>>>Rolf Tueschen
>>>>>>>
>>>>>>>Then the right presentation is:
>>>>>>>
>>>>>>>1-10 Shredder 7         2801-2737
>>>>>>>1-10 Deep Fritz 7       2789-2732
>>>>>>>1-11 Fritz 7            2770-2711
>>>>>>>1-2? Shredder 7 UCI     2761-2638
>>>>>>>1-15 Chess Tiger 15     2753-2700
>>>>>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>>>>>1-16 Shredder 6         2750-2689
>>>>>>>1-19 Chess Tiger 14     2744-2684
>>>>>>>1-19 Deep Fritz         2741-2680
>>>>>>>1-19 Gambit Tiger 2     2739-2681
>>>>>>>3-2? Junior 7           2715-2659
>>>>>>>4-2? Hiarcs 8           2707-2657
>>>>>>>
>>>>>>>and so on.
>>>>>>>
>>>>>>>Tony
>>>>>>
>>>>>>Thanks for the fine joke, Tony. Perhaps you lay your figer into the wound!
>>>>>>You want to have a number one, right? Then you make tests, just like you do,
>>>>>>fair and correct. And then you come into the period where you must evaluate your
>>>>>>results. You see that you have no clear umber one. Now two possibilities:
>>>>>>
>>>>>>1) You go on into decisive mode and do further tests, the "list" date can wait.
>>>>>>
>>>>>>2) You stay to your traditions and show up with your list. But then, please, do
>>>>>>NOT present the list either in the classical way, nor in your joking Mr. Bean
>>>>>>version, but simply make such packages:
>>>>>>
>>>>>>1.-3. A B C
>>>>>>4.-5. D E
>>>>>>6.    F
>>>>>>7.-10. G H I
>>>>>>etc.
>>>>>>
>>>>>>Tell me please, where the problem is with this method?
>>>>>
>>>>>Why just three strongest engines? With the margin of errors Gambit Tiger 2 could
>>>>>be as strong as the other top engines. I find Mr. Bean's version more logic then
>>>>>yours. Could you please explain your method further.
>>>>
>>>>
>>>>SSDF has good statistics experts. Consult these experts and you will understand
>>>>why Gambit Tiger 2 could NOT be number one. My first three was a pool where all
>>>>could be number one. Only Shredder 7 UCI could be included, but my example was
>>>>more a demonstration of such a list. It's not MY method. It's simply what
>>>>careful researchers would do if they had your results. Perhaps you don't know
>>>>it, Tony, but the presentation of the results must have a base in the results.
>>>
>>>What do you propose SSDF do exactly? Give me a clear example of how you would
>>>present the data. Don't give me this A, B and C. You have the result, wich
>>>programs are A, B and C?
>>>
>>>>In other words it might well be that one day you will have a clear number one.
>>>
>>>The bottom line is that when we reach a margin of error close to zero, then we
>>>can claim a number one? When will that happen? After 10 000 games by each
>>>entrance?
>>>
>>>>Or do you believe that your method guarantees the eternal status quo?
>>>>
>>>>
>>>>
>>>>>
>>>>>>Is it because you have
>>>>>>kind of strong wish to present a umber one by all means?
>>>>>
>>>>>Do you also think that FIDE shouldn't have a number one on there list? Is
>>>>>Kasparov really the best player?
>>>>
>>>>Please do not seek for outside help, when you run out of arguments in favor of
>>>>your own presentation.
>>>
>>>FIDE, ICCF and SSDF all have a ratinglist. And we all use professor Arpad Elo's
>>>metod of measure strenght in chess. And yes I argue for our way of presentation.
>>>ICCF's number one Ulf Andersson have played 25 games! Figure the margin of error
>>>there. They probably don't have any careful researchers.
>>>
>>>>>
>>>>>>Please let's simply
>>>>>>discuss this little topic. If you tell me, listen, Rolf, I am not allowed to
>>>>>>tell you, but you are right, that a umber one prog is very important for us.
>>>>>
>>>>>It seem to be more important to others.
>>>>
>>>>Yes, that was my deeper assumption. Could you give more details?
>>>
>>>Details?
>>>People here at CCC seem to be looking forward for our next list, to see wich is
>>>number one. And then they congratulate the programmer. And of course the
>>>commercials use it in there advertisement. As they always has. When we started
>>>our list, it was as a complement to our reviews for new programmes.
>>>Personally I'm not interested in wich program is number one. I'm more interested
>>>in how the different engines are playing.
>>
>>I can well imagine your personal sentiments and I have great respect for your
>>efforts with SSDF as a whole but you can't stop history's progress. When you
>>played move by move with the ancient chessboards your dedication and hard work
>>was really sensational and people got results for their virgin background. Today
>>- with autoplayed games - you have more time to do sound statistics. However, if
>>simply the top programs do not differ that much then you can't call out a number
>>one. Or you play millions of games. But who guarantees you that then you will
>>have a clear first? No - you should accept the actual reality. And that is
>>equality among the top entries.
>>
>>You are misleaden if you think that the thankfullness of the CC users was linked
>>with your presentation of a number one. It was because of your general efforts
>>to the best of CC.
> And the business world at that time was very coloured. But
>>today we have a single important company. Do you want to do your job for them
>>and their marketing interests  or for the users around the world? You must
>>accept that if statistically you have no clear first then you can't present a
>>number one program.
>
>Number one only means leading it does not mean best.
>I do not see what is your problem with it.
>
>
> What does that bother you??? You are independent! But
>>independent does not mean naive.Why don't you consider the consequences of such
>>strange events: Fritz8 is out for months and you don't test it. I read that you
>>wait until ChessBase will send you a copy. But that then would no longer speak
>>for your independent tests. Because factor time of testbeginning always was a
>>factor. All such dangers and difficulties you could avoid with sound statistics
>>and certain basic guidelines. You must become independent of such marketing
>>decisions by ChessBase.
>
>I do not see what is the problem with waiting for chessbase to send the program.
>It is not that they do everything that chessbase tell them and
>I believe that if chessbase ask them not to test programs of another company
>like Tiger they will not do it.
>
>I believe that they should test only if programmers ask them otherwise they may
>waste time on testing the wrong versions and they will have no computer time
>to test the right versions.
>
>They did not test a lot of programs and Fritz8 is not alone.
>They did not test Movei and hundreds of free programs and I see no reason that
>testing Fritz8 is more important when the programmer did not ask them to do it.
>
>Note that I did not ask them to test Movei and I do not complain(Maybe I will
>ask them in the future when Movei will be significantly better).
>
>Note also that testing Fritz8 is more important than testing Movei if both
>programmers ask them to do it but if chessbase do not ask them to do it then
>buying Fritz8 in order to test it may be a waste of time because they will
>have no time to test stronger Fritz.
>
>I think that the customers may also be intereted in the rating of Fritz that
>chessbase send them because I believe that the customers will get the same Fritz
>as an update and if the ssdf waste time now on testing Fritz8 they will have no
>computer time to test the upgrade that chessbase may release.
>
>Uri


You have interesting views on independance. Please come into CTF so that we can
talk about Israel. What you say is unacceptable from the point of independant
testings. You don't believe it, but then you have no knowledge about the
neccessities of statistics. It's not a moral or such, it's a must! Otherwise the
results are NOT independant and you can trash SSDF.

You are giving your personal opinions and nobody is allowed to attack you so far
but what is if you simply had no idea what is going on here? You have no
understanding for the meaing of average terms embedded in daily speech. You say
but they only tell us who is leading! That doesn't mean that he's the best. But
Uri, that is NOT the point at all. The point is that they cannot conclude that
someone is leading with these 8 points and a margin of 30 on both sides. You
have no idea what tht exactly means! So then you can well talk about "Let them
do what they do, they are not doing something wrong"! Uri, they are so wrong,
more than your own Prime Minister! Because they do something very special:

They say that Shredder7 is the new number one, the new leader as you say. And
they give these margins! Together that means: Folks, we have no clear result for
place one! And I argue against the mistakes. But here in CCC experts behave as
if the margins would make the overall verdict ok, because the experts know what
margins mean. I translate: experts are saying that a lie is not a lie as long as
the experts have a possibility to see whats really going on. But the lack of
respect for the dumb users is well allowed, because that is business. Against
that confusion I say, no no, SSDF is responsible because THEY annouced new
number 1!

Rolf Tueschen



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.