Author: Rolf Tueschen

Date: 04:32:05 01/07/04

Go up one level in this thread

On January 07, 2004 at 02:07:08, Sandro Necchi wrote: PART I >Sorry, I was not precise enough, so to let everybody understand I will try to be >more clear: > >the percentage of a program to be stronger than another in a single match score >55 to 45 is about 20% and not 81% >I do not care about statistic, but about real figure based on many tests and >experience. This is a real problem. If you don't care about statistics but still want to say something that IS statististics or based on it - then you have a real problem. BTW in PART II I agree with you totally. You are 100% right. But here in PART I you are very unexact again. You seperate statistics and REAL figures. And that is almost a philosophical problem then. It's science. With logic involved. Let's see what you are saying. You say that a prog is 20% "stronger" when it just has beaten another prog with 55 to 45. Are you sure? With what method you can show how sure you are? There you are directly in statistics again. In PART II you say the correct answer. You have to play MANY games to get something for sure. Probably thousand games. And here? You think that you can conclude that a prog is 20% stronger? You suddenly use a completely different wording. Because we simply dont use that expression "being stronger with so and so %". You misunderstand the statistical expression being this and that with a percentage of so and so. The percentage of significance (or of the certitude if you prefer) is something totally different that your "20% stronger". Because again, I ask you to describe with what certitude you can say that the one prog is 20% stronger than the other. And that is what we were talking about and you misunderstood. 80% significance simply is too small for a good statistical result. Know what I mean? And because we want to be 95% sure that our result is NOT by chance, therefore we need the "thousand" games and more. Look at this. JUNIOR led against FRITZ 5-0 and still wasn't the better program. Statistically I can say that 5 games is just to small a sample to be able to conclude something on the whole population. Here a huge mass of games is the population. FRITZ already equalized the score in the second 5 or 6 games. So in other words your certitude statistically was down to 50% almost and this is total chance with zero advantage in strength. All you know is that one game can be won bei A the next by B. Or 5 in a row by A and the next 5 by B. This is all possible. Only with thousands of games you could get higher certitude. Up to 95%. That is the so-called statistical formula. BTW if you are looking at the mere results of SSDF lists, you can see with your eyes that the range of deviations that is possible to the left or the right (minus or plus) is HIGHER than the difference in the naked Elo points of the progs. Conclusion: with a couple of MORE games the list would look different. The actual ranking is NOT sure or certain at all!!!!! (The SSDF is so honest as to present this screw themselves, so they dont cheat at all! They do only avoid to write that the actual ranking could also be the reverse in the first ranks. But it follows from the REAL results of their own testing.) This is what I am saying for years but nobody will listen because we dont have a good alternative and personally I dont want to be involved in such nonsense tests either. >So if you are interested to know how correct the result is than you >get 20%. Objection. Again you misunderstand. By definition the number can't go below 50% because at 50% you have total chance and no "correctness" at all. It could be so or the other way round. Sandro, please, don't feel insulted again, this is stats, and it's damned hard stuff to digest. Other people studied such things for years and you won't get it in hours, not to speak of minutes. Sorry. >The reason is that if you look at the games you quite probably will find >variation which scored quite well (or quite bad), thus putting a big weight on >the final score. This is why it is better to make the same test against other >chess programs; at least against other 5. You are absolutely right with your habit. You as the book author you must test this way. But you are not primarily interested in the overall strength but the advantages of certain variations. Advantages related to the actual version of the engine. Of course this is decisive for the overall strength too!! And good luck for your future engagements! PART II >There are only 2 ways to know if a program is better than another one: > >1. To make a huge amount of games against several opponents; at least 1000 >games. This everybody can do. > >2. To look at the games and analyze them. You need to be a strong player to do >this and/or to know chess programs a lot as well. But since computerchess still isn't on GM strength aslso less strong players than GM can analyse computer games. :) Rolf > >Sandro

This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.