Author: Rolf Tueschen
Date: 15:13:56 08/26/02
Go up one level in this thread
On August 26, 2002 at 07:54:49, Uri Blass wrote: >On August 26, 2002 at 07:21:58, Rolf Tueschen wrote: > >>On August 26, 2002 at 06:45:03, Uri Blass wrote: >> >>>On August 26, 2002 at 06:18:11, Rolf Tueschen wrote: >>> >>>>On August 25, 2002 at 12:00:06, Peter Fendrich wrote: >>>> >>>>>On August 25, 2002 at 07:59:54, Kurt Utzinger wrote: >>>>> >>>>>>Please have a look at >>>>>> >>>>>>http://ccc.it.ro/search/ccc.php?art_id=217174 >>>>>> >>>>>>Regards >>>>>>Kurt >>>>> >>>> >>>>============================================================================= >>>>>These tables are not accurate at all for the lines covering only few games. >>>>============================================================================= >>>> >>>>So, the tables are not correct (for the cases when you only have few, very few >>>>games!), "because" the tables require normal distribution. So far so good. Now >>>>you are argueing, let's take binominal or trinominal, and then we could get rid >>>>of the limitations when you have very few cases (like in SSDF)? I hope I had no >>>>language interferences? >>> >>>ssdf usually play hundreds of games with every program so I do not see the only >>>few games problem. >> >>Excuse me, but I see it. How many hundreds of games they play, that could be >>added up? > >Here is the list of the programs above 2600. >You can see that the porgrams played usually more than 400 games Yes, Uri, I knew it. But! I wrote "that could be added up". Didn't you know that the newest progs also play neandertal (M. Scheidl)? How could they even suggest that we accept such a nonsense. The term validity comes into play. I repeat the question. What do they measure?? Learning function? Or book? This is so trivial for someone who knows what is important in statistics. But this is all a repetition. I think in May 2002 I had already explained all that. Only - you will notice that the SSDF team doesn't answer the exact questions. Not that it matters because I'm talking about undeniable facts. This is not my personal liking or my idea or wishful thinking. > >1 Fritz 7.0 256MB Athlon 1200 MHz 2741 30 -29 574 64% 2636 >2 Shredder 6.0 Paderb 256MB Athlon 1200 2727 34 -32 467 65% 2619 >3 Chess Tiger 14.0 CB 256MB Athlon 1200 2721 33 -32 487 63% 2627 >4 Gambit Tiger 2.0 256MB Athlon 1200 2718 31 -30 523 60% 2645 >5 Shredder 6.0 256MB Athlon 1200 MHz 2717 32 -31 505 64% 2618 >6 Deep Fritz 256MB Athlon 1200 MHz 2716 33 -32 491 63% 2622 >7 Junior 7.0 256MB Athlon 1200 MHz 2689 29 -29 593 58% 2632 >8 Rebel Century 4.0 256MB Athlon 1200 MHz 2684 33 -32 475 63% 2586 >9 Hiarcs 8.0 256MB Athlon 1200 MHz 2671 28 -28 624 55% 2638 >10 Shredder 5.32 256MB Athlon 1200 MHz 2669 30 -30 538 57% 2622 >11 Gandalf 4.32h 256MB Athlon 1200 MHz 2652 34 -33 430 54% 2624 >12 Deep Fritz 128MB K6-2 450 MHz 2651 23 -23 959 61% 2571 >13 Gandalf 5.0 256MB Athlon 1200 MHz 2642 49 -50 202 46% 2674 >14 Gambit Tiger 2.0 128MB K6-2 450 MHz 2641 29 -28 634 66% 2525 >15 Gandalf 5.1 256MB Athlon 1200 MHz 2638 26 -26 707 55% 2601 >16 Junior 7.0 128MB K6-2 450 MHz 2632 25 -25 815 65% 2524 >17 Chess Tiger 14.0 CB 128MB K6-2 450 MHz 2629 28 -27 667 62% 2543 >18 Shredder 6.0 UCI 128MB K6-2 450 MHz 2627 55 -54 168 57% 2581 >19 Fritz 7.0 128MB K6-2 450 MHz 2625 41 -41 294 53% 2604 >20 Fritz 6.0 128MB K6-2 450 MHz 2619 21 -21 1110 61% 2541 >21 Crafty 18.12/CB 256MB Athlon 1200 MHz 2613 30 -29 561 53% 2593 >22 Shredder 5.32 128MB K6-2 450 MHz 2605 28 -27 639 58% 2547 > >> >> >>> >>>> >>>>Without agitation let me make this very clear. Any attempt to show something >>>>reasonable out of only very few cases (like in SSDF) is a myst. The limitations >>>>out of very few cases is absolutely given. There is no way or "trick" to heal >>>>that. >>>> >>>>There is only one single remedy and that is the higher number of cases. And >>>>therefore the actual practice of SSDF is meaningless. And no adding would help >>>>you out of this mess since you are presenting over 30000 games but these games >>>>come from totally incomparable entities. But you could have known this before. >>>>The adding of games in human chess is a completely different process. >>>> >>>>BTW let me repeat the question where you take the validity from in SSDF. What do >>>>you measure? And how did you find control mechanisms? >>>> >>>>Also interesting could be where the similarities in Swedish ELO and human chess >>>>ELO are coming from? Is this decided by definition? When was it done? >>>> >>>>Rolf Tueschen >>> >>>The list is calculated also based on games of humans against old computers. >> >>Tournament games? Do you know details about the very few games then? I think we >>are talking about a myst, excuse me. > >You can download 14738 of their games in >http://home.interact.se/~w100107/welcome.htm but unfortunately I do not find >comp-human games there. > > >You can find list of human-calibaeration results from 1987-1991 when 24 old >programs played against humans and got rating based on average number of game >that is slightly more than 10 games for program but unfortunately there are >only results and no games when chris carason games do not include the games that >they talk about. That isn't even the most interesting thing. I take it for granted that they played these games. But. You can't take some 20 masters from Sweden and let them play a few skittles. This is not calibration. It's a joke. Do you think that "masters" had something to fear from commercial progs? I don't think so. Had they knowledge of the progs? Training? Interest at all? Incentive? Money? Where are the data from these events. The evidence. It doesn't work like that! You can't take some old master who has still 2450 in the lists and then put him in front of a program. And then you take the results as a proof for the strength of the machine. Uri - I know that you are participating in Israel's championships and therefore you know that this is not realistical what happens in such skittles. Where nothing is at stake. The computer side takes masters to get Elo numbers! It isn't kosher to say the least. It is surely _not_ calibrating. The absence of the game scores is absolutely uninteresting when we are talking about skittles. And here the argument that SSDF is _not_ about science, but it's a private hobby, is _not_ acceptable. You see how you understood calibration! But without calibration and validity you have nothing but results and performances. But not Elo numbers comparable to human chess. BTW all this is _not_ a question of intelligence. Even the most intelligent people could be cheated with statistics. Because if you once rely only on your natural human estimation you must forcably miss the statistical tricks. With stats you can prove that toothbrushs cause the birth of babies. And with SSDF I can prove that FRITZ has 3000 ELO. :) > >see http://home.interact.se/~w100107/level.htm for list of the programs that >played against humans and their rating based on the games. Skittles. Shows. Fun. >> >> >>> >>>The rating of the good programs in the list were too high so they decided 1 or 2 >>>years ago to reduce the rating of all programs by 100 elo to make the rating of >>>the programs in the top of the list more realistic against humans. >> >>And now the height is ok? How did you prove it? > >I did not prove it but I think that most people agree that 2841 for Fritz7 on >A1200 is at least 100 elo too high so reducing the number by 100 elo reduce the >difference relative to humans. Uri, Uri! I drives tears in my eyes to see you argue so carefully. But you are already intoxicated. Please subtract 300 Elo numbers and then we can start the debate. Just my opinion. Other numbers are completely unrealistic. Or did you ever see events over a longer period of time, at tournament level, and with real money at stake? And most of all, did you see fair rules? The rules are still coming from the old days when progs were no real opponents. And also this. You know exactly that progs at one time are very good and in others they are weak like beginners. I don't mean blunders, I mean misunderstanding very basic chess concepts. It's still a mess. Please please do not even think for a second that I have a lack of respect for you. Would I write such articles if I had? Rolf Tueschen > > >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.