Author: Sandro Necchi
Date: 05:04:08 07/17/05
Go up one level in this thread
On July 17, 2005 at 07:04:21, Pallav Nawani wrote: >On July 17, 2005 at 05:22:47, Sandro Necchi wrote: > >>I have been laughing a lot (maybe crying on the ignorance would have been more >>appropriate?)reading many wrong statements about testing and Elo lists. > >I guess you know better than us, but your arguments hold no water. > >>so, for those who are new and do not know, SSDF list is the best for the >>following reasons: >> >>1. They use 2 computers and the program complete with own book and ETG, with >own gui and best setting as suggested by the programmer. > >Depends on what you want to test. I you want to understand the strength of the >_complete package_, then yes, using own Gui, using own Book with learning on is >the best way. Most of the users wants exactly this. >However, testing the strength of the _engine_ in isolation also >means using the openings where it might not play as well. This is also a valid >way of testing, despite claims to the contrary. Yes, if ones want to make a list of weaknesses and strenghts for engines. > >>2. They use long time controls (40/2h 20/1h; international level) only. > >Irrelevant. I do not agree. One example: Shredder books have been made for long time controls. I mean the selection of the moves has been made for that, so the book would be less good on blitz games. This show your statement is false. >For _rating_ (Mind you, _ratings_) Any time control is good enough >as long as it is not so small that programs lose on time. There is a way to avoid this...a good one. CEGT time control is >long enough, IMHO. What is important is having enough number of games. Quality is import as well as quantity. It depends how the tests are made. >SSDF do >have a good number of games, of course, but just not enough to differentiate >between two programs that are very close in strength. If afer 1000 games 2 programs are very close, do you really think that after 10000 they will not be the same and or if there is a difference of 2-10 points would that make a difference for a user? > >>3. They use the same hardware for all programs. >I agree that this is a good thing, because this makes the matches more >consistent. On the other hand, different hardware affects different programs >differently, so (in theory) the result we have is only correct for SSDF harware. Of course! >For instance, how would crafty perform if they used 64 bit Opterons? However, >the differences are usually small. Well, if can afford 2 than you are lucky! I believe SSDF will switch to that hardware soon or later anyway. > >>4. They use a very wide range of programs and not only the new ones to get more >>reliable results. >I think every rating list does that. I was listing all the points which are important, not claiming that they are all not included in other testing methods. > >>5. Ponder on and learning are activated. >For rating purposes, ponder on is irrelevant, since pondering is effectively >nothing more than giving more time to a engine. I do not agree. If one engine is better to guess the opponent reply can play better and reach higher depths. You are handicapping some engines by removing this option. >For learning see the answer to >your point 1. Learning is what makes the computer more human like and not stupid (loosing games exactly in the same way it did before). > >>So, anybody can test in a different way as they wish, but to claim that system >>is better or replacing the SSDF system is pure nonsense! > >While I agree that SSDF gives us a good idea of the complete package, claiming >that SSDF is better is also pure nonsense. For instance, Shredder 7.04 and >Shredder 8 are very close in the SSDF list. Do you think that Shredder 7.04 and >Shredder 8 have the same strength? They are different and close to each other in strenght. I think the SSDF list is correct in this case. Also, again Shredder 7.04 is tested in UCI version while Shredder 8 in CB gui. The UCI gui offers more performance and better learning so Shredder 8 UCI should be about 25 points stronger than Shredder 8 CB. I am talking about the complete package of course. >Every rating list has its anomalies.. Yes, but I like to be very precise in all my statements...as you can see... > >Best Regards, >Pallav Best regards Sandro
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.