Author: Rolf Tueschen
Date: 09:55:56 09/13/02
Go up one level in this thread
On September 13, 2002 at 12:26:33, Joachim Rang wrote: >On September 13, 2002 at 10:54:27, Rolf Tueschen wrote: > >>On September 13, 2002 at 10:39:48, Joachim Rang wrote: >> >>>I disagree: >>> >>>If you got a result 52-48 you can't say, which engine is better, but if you got >>>a result 5200-4800 you can at least with 99% probability say, that program A >>>performs better against program B (which doesn't mean, that program A performs >>>better than B against other programs). >> >><smile> >> >>And you were sure that A is "better" than B? >> >>But I went too far. I ask you: Are you _sure_ that for CC and the many variables >>uncontrolled you know then the better performance with 99%? >> >>Prove it. But please not just by reading in the tables in Books on Statistics. >>Also elaborate why you are allowed to make use of the specific tables. You made >>the necessary checks? You have all variables under control? (etc.) >> >>Rolf Tueschen > > >whats your point? I'm not sure, but I can assume with 99% probability that >program A performs better against program B. And if I test program A against all >other programs N, with similiar results I can assume with 99% probability that >program A is _better_ than the other programs. Maybe I'm wrong and it's only 95% >probability or maybe only 90 %, but in either case I got a high probability. > >Well, I can't prove that, but what are the indications that one can't assume >whis this probabilities? Easy one. I differentiate between the mere factual of numbers in results or statistics and the real meaning also under the aspect that all the laws of say statistics must be respected. Simply because otherwise the best routine makes no sense. Just to give a single example. If you had some bias in the versions the question could be why then the "better" prog did only win by 400 points in your example with almost 10 000 games. Did the learning prevent the defeat? If learning was a completely uncontrolled bias among the two progs? In short, all these questions must be reflected before you start your whole test or before you make any conclusions out of your results. The mere multiplication of N alone can't bring you closer to the wished result in the end and could be a big waste of time.It is not ok to assume, ok, now I made 40 games, but I could make 400 and then I had certainty. Ok, if I had the time and could run a test with 5000 games I were the best tester in the world. My point is that this is pure nonsense. Therefore I always read with reserve when SSDF proudly mentioned their 20 or 60 thousand games in these two decades. This number alone means nothing at all. If I like cherries, a complete list of vegetables is no help for me, in special if there are many beans again this season. I'm looking for cherries. Cherries have several factors I like. Colour, perfume, taste, size, to name just a few. Now what is this, when SSDF is testing strength, if the leading progs are almost the same. There is too much bias in SSDF. Much better were judgements like JUNIOR plays inspired chess by saccing and exploitating the chaos. FRITZ is deepest. If that were true. Or XY has the best learning feature. Does the SSDF or anybody else research such questions? Of course not. And here I am on the side of certain critics. Suddenly the autoplayer was invented. And without further thinking SSDF thought that this was a terribly good idea. But as we know the concentration on mere autoplaying is resulting into nothing. The resulting differences are not significant. And quality is not being tested at all. I could continue like this for weeks, but the famous funnel from Nuremberg is not the best method to teach. Let's see what the debate brings. Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.