Author: Mike S.
Date: 04:05:50 01/04/04
Go up one level in this thread
On January 03, 2004 at 20:53:39, Rick Rice wrote: >(...) >Ideal solution would be for SSDF to have one massive board with one CPU and >memory for each program (equal CPU and mem for all the progs on its list) and >some way to automate the play of these programs against each other..... on >different time controls such as regular, blitz etc. Just wishful thinking for >the future, but it would eliminate the multiple and varying results. Actually that's what they do anyway, with the exceptions that they run only 40/2h (and of course can't run their tests with every existing engine, due to limited capacities). But we can ask even further: What's the use of knowing which engine is "most probably" 20 Elos better or worse? How do you use that information, what is the sense of it? Programmers need such infos to determine, if their current engines are improved or not, in addition to other tests (positions, and more technical things I guess). But they need to have their own sources anyway, because SSDF and ohter "public" testers can't usually start earlier than before the engine is released. (Beta tests in these common ratings list remain exceptions.) It has been explained that even ratings based on many games can't allow highly reliable predictions for short matches, i.e. 10 games, especially when the strength margins are so narrow between many engines like it is now. In short matches, (nearly) anything can happen. 1. When somebody wants to run engine X against Y, does he require to have rating infos before, for a prediction? What for? (I just hope nobody runs matches only to verify or falsify predictions based on some ratings :-)) 2. When the rating competitions can't produce valuable data that really is worth the effort, I'm beginning to doubt that it makes sense at all. 3. I suggest that every fan and user should define his *individual* set of requirements, IOW what an engine should be capable of, so that he consideres it to suit his needs, or what's necessary that he'll consider it to be a top engine worth to buy the next version too etc. This doesn't have to be objective (but individually subjective rather). For example, I've defined a set of tactical positions which inform me about the engines combinative strength. For me, this is more interesting than overall comp-comp gameplay performance. SSDF Elos are a relatively "artificial" values, correctly based on performances of course, but not meaning much for the way I actually use engines. - Another thing is endgame knowledge; I like it better when engines have certain understanding in the late endgame and do not rely on tbs. too much (but for that, I didn't define a test set yet). These things don't have to correspond with the strenght relations in games at all. Another user may be more interested in good positional play or -evaluation, understanding of positional elements... stronger players than me probably can define their own test sets for that as well, or even judge upon it by playing against the engines themselves. Some users do this, and/or provide in-depth commentary in the message boards. In think, that way of (individually) getting impressions and estimations of engines, should be considered much more often than to look at all these Elo figures always. It gets boring. An engine may be 100 Elos behind and play great, entertaining games (and vice versa)... Regards, M.Scheidl
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.