Author: James T. Walker
Date: 17:46:36 05/22/01
Go up one level in this thread
On May 22, 2001 at 15:25:08, Dann Corbit wrote: >On May 22, 2001 at 14:50:00, James T. Walker wrote: > >>On May 22, 2001 at 13:39:22, Dann Corbit wrote: >> >>>On May 22, 2001 at 13:27:09, stuart taylor wrote: >>>[snip] >>>>Yes, Crafty is number 4, which I overlooked. Sorry! >>>>But I didn't overlook the comercial list. But that was very few games, which is >>>>good, but says very little. But you can't just bungle all the amatuer programs >>>>together with it to make CM8K to look so great overall. It's nowhere near the >>>>same category as tests against recent comercial programs. >>> >>>In what way? >>> >>>According to the last WMCCC, the world champion was Shredder. The runner up was >>>Ferret, an amateur program. >>> >>>In the previous CCT contest, in which several commercial programs particpated, >>>the winner was... >>>Crafty -- an amateur program. >>> >>>Look at the recent Leiden contest. Some amateur programs were near the top and >>>triumphed over some commercial entries. There used to be a large gap between >>>the amateur and commercial programs. I believe that the gap was mostly due to >>>superior opening books of the commercial programs. That gap has narrowed, as >>>the amateur entries now operate with sophisticated opening books. >>> >>>I believe that the gap between the strongest amateur programs and the strongest >>>commercial programs is very small. Of course, there is not enough empirical >>>data to back up my assertion, so it is only an opinion. >> >>What do you mean by "very small??" The latest SSDF list has Deep Fritz at 2650 >>after 470 games and Crafty 17.07 at 2487 after 857 games. What would constitute >>"enough empirical data??" On what empirical data do you base your "opinion?" > > 1 Deep Fritz 128MB K6-2 450 MHz 2650 34 -32 470 66% 2537 > 2 Fritz 6.0 128MB K6-2 450 MHz 2626 24 -24 897 66% 2512 > 3 Junior 6.0 128MB K6-2 450 MHz 2594 22 -21 1109 64% 2490 > 4 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2578 27 -27 691 62% 2492 > 6 Fritz 5.32 128MB K6-2 450 MHz 2547 26 -26 741 59% 2485 > 6 Nimzo 7.32 128MB K6-2 450 MHz 2547 24 -24 857 59% 2485 > 8 Nimzo 8.0 128MB K6-2 450 MHz 2539 30 -30 546 58% 2486 > 9 Gandalf 4.32f 128MB K6-2 450 MHz 2529 29 -29 584 52% 2518 > 10 Junior 5.0 128MB K6-2 450 MHz 2528 26 -25 750 57% 2476 > 11 Hiarcs 7.01 128MB K6-2 450 MHz 2526 37 -37 361 48% 2539 > 12 Hiarcs 7.32 128MB K6-2 450 MHz 2525 27 -27 679 56% 2481 > 13 SOS 128MB K6-2 450 MHz 2524 23 -23 925 53% 2501 > 14 Rebel Century 3.0 128MB K6-2 450 MHz 2514 31 -31 504 50% 2516 > 15 Goliath Light 128MB K6-2 450 MHz 2496 30 -30 546 46% 2527 > 16 Crafty 17.07/CB 128MB K6-2 450 MHz 2487 24 -24 857 47% 2505 > >In order for confidence to rise from 2/3 to 95%, we must use two standard >deviations. Within that error bar, Deep Fritz is: >2650-(32*2)= 2586 ELO. 2650 + 34*2 = 2718 >and Crafty 17.07 (quite an old version) is: >2487 + 24*2 = 2535 ELO. 2487 - (24*2) = 2439 > >So, considering the error bars (2439, 2535) : (2586, 2718) we *can* say Deep >Fritz is a little stronger with pretty good certainty. But (of course) crafty >has gone through a raft of versions since then. Is the same difference still >true? > >Yace is similar in strength. If given opening books of equal quality, I suspect >that the best amateur programs (e.g. Yace, Crafty) are very close to even with >the strongest commercial programs. > >I think that most people have no clue what the numbers in the SSDF list mean >(and I mean not a *single* one of the numbers) and that's too bad. Not saying >that you don't of course. But even the ELO figure is widely misunderstood. > >Does the autoplayer used to play these games still issue a reset between each >move to the engines (which the commercial programs are designed to ignore)? > >Personally, I think the learning attribute of some programs is not a slant >against the autoplayer, since the games will learn in actual use also and >thereby improve. But it raises an interesting question. Have the learning >programs been playing longer than new entries and gaining constantly with their >learning files? If so, is the test accurate? > >In other words, I think: >1. The SSDF is definitely the best data we have available to determine >engine/engine strength estimates > >2. The data rapidly ages with new versions of programs [e.g. Tiger 12 -- aren't >we on Tiger 14 now?, crafty 17.07 -- aren't we on 18.9 now?] > >3. The data is valid ONLY for the machines in actual use and the exact >conditions of the trials. On different architectures, the engines quite likely >will play very differently. I have observed this effect very much so with Intel >compiler builds running on older and on newer machines. Up to a factor of 4 in >difference from what you would expect judging by CPU MHz alone. > >4. There may be flaws with the experiment (but I have yet to see a better >design) > >5. The error bands in the strength figures are widely misunderstood. > >I don't know that you will agree with me, but I expect you can see what I am >driving at by now. Hello Dann, Of course I agree with your numbers since they are based on solid math principles. The problem is this same "excuse" has been used since I first came here about 3.5 years ago and started testing Crafty 14.something. It was approximately 150 points behind the latest commercial programs then and as near as I can tell it's about the same distance behind now. Of course you can use the uncertainty factor to explain that they may be closer than the numbers show but that has been the same excuse for 3 years. I think if anything the past 3 years has shown that the data is more reliable than you are willing to accept. Yes you change commercial programs every year or two and Crafty changes every month or less so you will always have this "excuse". The problem with your uncertainty factor is that they have never merged together as you indicate they possibly could. I have no doubt that Bob is leading the way in several areas concerning chess programming and a lot of programmers are following his lead in these areas but to me it is obvious by now that Crafty's strength overall is still lagging behind the top commercial programs. I have my own database of over 12000 games now and several versions of Crafty are included and the data shows that Crafty is improving at about the same rate as the top commercial programs in computer vs computer games. I don't have a clue about computer vs human contest since I believe this is very different and does not necessarily follow the comp vs comp statistics. In any case we can agree to not agree on this and there is no need to argue about it. Believe me if/when Crafty tops the SSDF list I will be one of the loudest in cheering. I will also cheer when Ferrett becomes available to the public. ( I will cheer for about any advancement in computer chess :-) ) In the meantime I enjoy matching the programs against each other on equal hardware and watching the fireworks. Regards, Jim
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.