Author: Bernhard Bauer
Date: 07:44:01 05/18/99
Go up one level in this thread
On May 18, 1999 at 10:11:45, Chris Carson wrote:
>On May 17, 1999 at 13:32:57, Goette Patrick wrote:
>
>>Hello,
>>
>>As I am a new user of Chess Master 6000 ( I never had a former version of CM
>>before ) I would like to know, according to your experience with this software,
>>if CM has a particular reputation of being weak in the positional play, in
>>opposition with its tactical play and its force in the finals.
>>Let me explain : I just tested 3 programs with the chess test LCTII ( of
>>Louguet, a french author ). Here are the results I obtained on my 166 Mz Pentium
>>with 32 Mo Ram (hashtables homogeneized) :
>>
>> Virtual Chess 1.02 Hiarcs 6.0 CM6000
>>
>>Positional play : 200/420 pts= 47.6% 210/420=50% 145/420=34.5%
>>
>>Tactical play : 195/360 = 54% 235/360=65.3% 280/360=77.8%
>>
>>Finals : 60/270=22% 95/270=35.2% 120/270=44.4%
>>
>>TOTAL 2355 ELO(according to LCTII) 2440 ELO 2445 ELO (!)
>>
>>
>>Well my questions are these :
>>
>>How do you explain the relative weakness of CM600O performance in positional
>>play in comparison with the 2 other programs ?
>>
>>What is the relevance of LCTII test to evaluate the strength of programs, when
>>we know the true ELO rating of CM6000 is at least 100 points higher ?
>>
>>What are the tests the more relevant to evaluate the true ELO-rating of a chess
>>program ? ( I mean whose results get close to the ELO estimated through many and
>>many games played as usual ) ?
>>
>>Every remark, opinion, explanation welcome.
>>Patrick Goette
>>patrick.goette@smile.ch
>
>Most testsuites are good for the machine calibrated to. The problem is
>when you move away from that machine the time limits may not provide an
>accurate rating (although still useful information about evaluation and
>searching can be made).
>
>I see these problems with most test suites:
>
>1. some problems are to easy to solve (< 1min, PII300). Small
> solution time makes scalability to faster machines difficult.
> and makes it difficult to distinguish between programs.
>
>2. some problems are to hard to solve (> 3min, PII300). Large
> solution time make scalability to slower machines difficult and
> makes it difficult to distinguish between programs.
>
>3. Wide range of problems difficult to identify. Need clear cut
> best solution, but hidden several ply deep. Need different types
> for quiet positions (no clear tactics), clear combinations, endings.
>
>My test suite requirements look like this:
>
>1. 36 epd positions.
>2. 12 quiet positions, 12 combinations, 12 endings
> with clear best moves or clear alternate
> moves.
>3. 1 to 3 min avg solution times on PII 300 for top 10
> commercial programs. This would make the total test
> time 36 to 108 minutes on a PII 300 machine and would
> increase for slower machines and decrease for faster
> machines. Produce wide variation among different searching
> and evaluation styles.
>4. Known PV's for 6 ply.
>5. Let the program/hardware run to solution time (hold for 3 ply).
> Use a log formula to calculate a rating (Kaufman proposed
> 2930-200log(T), where T is total solution time) something like this
> would produce good rating estimates on slower and faster
> machines (equation would need to be calibrated across 3 different
> machines, dx-66, P90, P200MMX could be used to calibrate with
> SSDF list and verified against PII300 results).
A formula of this type is not a good idea because T may become infinite.
Otherwise you have to limit the solution time.
So I would propose a formula of this type:
Melo = BaseElo + EloRange * ( 1 / NumberOfTestcases) *
sum_i { 1 / ( 1 + time_i / ReferenceTime )
Where
Melo : meaningless elo number
BaseElo : lowest elo number that can be achieved by this test
EloRange : the range of this test
sum_i : sums up {} for each position
time_i : the time needed to solve position i
ReferenceTime: a reference value for this test. For example
ReferenceTime = 300 would add .5 to the sum, if this problem
is solved in 5 min.
>6. You should always buy based on actual games played, this will give
> a true rating, the test is just an estimate and provides information
> about searching/evaluation weakness and speed.
>
>This is just my opinion, I am working on a test suite for my own program
>that has these characteristics.
>
>Best Regards,
>Chris Carson
Kind regards
Bernhard
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.