Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Less Mud, More Light (see also comp-comp)

Author: Graham Laight

Date: 05:14:28 01/09/00

Go up one level in this thread


I fear that this thread is becoming uninteresting to most people, but lets give
it one more shot...

On January 09, 2000 at 05:48:57, Stephen A. Boak wrote:

>>On January 08, 2000, Graham Laight wrote:
>
><much snipped from all discussions below>
>
>>I can think of one similarity - they're both a group of players trying to win a
>>game of chess under the same rules.
>
>Hey, one SIMILARITY--pretty good!  Did it take long to think of it?  :)
>
>Let me help you with another similarity that you obviously overlooked--both
>groups of opponents tend to favor Queens over Pawns for material purposes!
>Another astounding similarity, if you just think about it!  I didn't really
>think about it--I just blurted it out because the thought hit me.  Someone else
>can work out the profundity in the observation.
>
>I have additional, inane similarities if you need to draw on more ammunition.
>For a lost cause, however, I will not waste my time.
>
>On the other hand, I can think of several (meaningful) DISSIMILARITIES--
>
>1) One group of opponent players (i.e. opponents of the comps) is ALL COMPUTERS,
>but the other group of opponents is ALL HUMANS.

In terms of playing chess, you might as well have divided the pools into brown
eyed people and blue eyed people. They are still entities of one type or another
who are trying to win a game of chess.

>2) NONE of the opponents in one group are opponents in the OTHER group.
>
>3) The ratings of the ALL COMPUTER opponents in the comp-comp pool are all
>derived from relatively recent 100% comp-comp prior play (referring to SSDF
>ratings), not from recent comp-human prior play.  On the other hand, the ratings
>of the ALL HUMAN opponents in the comp-human pool would be all derived from
>relatively recent human-human prior play (FIDE ratings).
>
>4) The SSDF ratings gained by the ALL COMPUTER opponents are not FIDE ratings.
>The FIDE ratings gained by the ALL HUMAN opponents are not SSDF ratings.

Points 2-4 could apply equally to brown/blue eye classification of players.

>5)  The historical connection between SSDF and FIDE ratings is absolutely zero
>(SSDF ratings were seeded many, many years ago with Swedish ratings, not FIDE
>ratings, of relatively low strength players) or so remote as to be of no weight
>today after many years of only comp-comp play).

The FIDE web site, + FIDE members have disputed this. If you wish to use this as
a working assumption, it is YOU who must provide evidence to support it.

>6)  A) In statistical sampling, the results of sampling are meaningful (useful
>to draw conclusions with high degree of confidence) only within the normal
>(central) range of the sampled items.  The results (conclusions) of sampling
>become less meaningful as applied to items closer and closer to the boundaries
>or range limits of the sampled items.  It is mathematically improper (illogical,
>without foundation) to draw conclusions about characteristics of items far
>outside the sampled range.  Also, when something is measured against a scale,
>the accuracy of the measurements is meaningful only within the normal(central)
>range of the scale.

Then an Elo rating of, say, 1500 can never be compared with an Elo rating of
2800, because they are not in the same range.

>    B) Even if one assumes the initial Swedish seeding ratings were highly
>comparable to FIDE ratings, there is a major problem, due to point 6A, above.
>If relatively weaker Swedish players (i.e. not FIDE IM and GM strength, in
>general) were used as the 'scale' against which SSDF seeded or 'initially
>measured' comps to establish their relative strength versus humans, the seeding
>not only was so long ago as to be of no weight in today's SSDF comp-comp

Then the FIDE ratings of today are not valid compared to the FIDE ratings of 15
years ago, because it has been so long since they were "seeded".

>ratings, but was also established against a 'scale' of limited range and
>significantly lower human rating average than the alleged ratings of modern
>programs which that seeding allegedly somehow helps validate today.  The problem
>is that it is a violation of statistical logic to use a low level rating 'scale'
>or range (i.e. the relatively low strength Swedish players) to today assert a
>FIDE-equivalent comp (vs human) high level rating, which would lie virtually
>(probably 100%) OUTSIDE THE LIMITS of the initial 'scale'.
>
>>>previously, Steve Boak wrote:
>>>Analogy: Two human runners, ranked in track sports--one (A) very good at long
>>>distance events but very poor at sprint events; the other (B) of medium ability
>>>in either type of event.  If they both enter a long distance events, A is likely
>>>to do better than B.  If they both enter sprint events, B is likely to do better
>>>than A.  Now A has not changed, nor has B changed--same runners in both event
>>>Pools, each ranked correctly in relative ability in both types of events.  Yet
>>>their rankings switch places in the different events.  There is no failure of
>>>the ranking system.
>>>
>>>Why?  The competitors entered two events whose compositions are vastly different
>>>in general--sprint event contains mostly sprint specialists; distance event
>>>contains mostly distance specialists.
>
>>But in the case of chess, they're both a group of players trying to win a game
>>of chess under the same rules. Your analogy doesn't do much for me, I'm afraid.
>
>Do all blonde-haired opponent chess players have the same rating?  Do all
>black-haired opponent chess players have the same rating?  If we held two large

Relevance?

>comp-human Pool events, using the same comp players in each Pool but only human
>opponents restricted 100% to a single Pool by hair color, would the comps all be
>rated and ranked the same after each Pool concluded their many games?  Yet by

Depends on whether the human pools' ratings matched each other. This in turn
depends on what steps had been taken to match the ranges.

>your remark, since the opponents of the comps are all a group of players who
>want to play and win at chess, you would expect same relative ratings and
>rankings for all comp entrants in the two Pool events?
>
>Hmmm.  By reductio ad absurdum (carrying/reducing this theme to its extreme),
>this leads to the following--If two distinct opponent groups (Pools) are chess
>players, and if all players want to win (regardless of group), then comp chess
>results (mean individual ratings and relative comp ratings) for comp players in
>both groups will be the same between both groups (Pools).  As ugly a syllogism
>as I'd ever want to stand behind!

It would be true if sufficient steps had been taken to match the rating scales
of the 2 groups.

>>Given that the SSDF web site states it, and that SSDF members support the notion
>>in this forum, the burden of proof should be on your side of the debate, not
>>mine.
>
>Ok, you have opinions, but decline to produce convincing or original evidence of
>their truth.  It is your right to avoid substantiating your claims.  Don't
>appear shocked or dismayed if others don't adopt your opinions to the exclusion
>of their own without further discussion on the merits of your *and their* cases.
> Don't appear shocked or dismayed if others are not goaded by character
>assignations made in public accompanied by wild claims of bias without substance
>(evidence, not mere speculation).

You don't need 100% proof of your case to win an intellectual debate - you
merely have to demonstrate that the weight of evidence is heavier on your side
of the scale than it is on theirs. This is the limit of what I've been saying.

So - Tiger beats Century 12-8. OK - it's at fast time speeds. OK - this isn't a
massive staistical sample. I'm sure there are other flaws as well.

However, the implication is that Tiger is a stronger program than Century. It
may be that if you ran another 20 games, Century would win. However, on the
basis of the evidence we have, relatively weak though it is, it is just as
likely that in another 20 game match, Century would do even worse than 12-8 than
it is that it would do better (it could, of course, achieve the same score).

So it is OK to say that the result implies that Tiger is stronger than Century.

From this, one could also speculate that, in a tournament against humans, Tiger
would better represent the programs than Century would.

That's it. I might have worded my original post in a way that makes more impact
on the reader than a simple statement of my position would have. As a keen
participant in debating competitions, it's a good habit to have.

>Ok, others have opinions and have published their reasoning (however valid)
>regarding their opinions, and you wish to ride on their bandwagon and rest your
>case on their 'proof'.  Fair enough.  You have a right to agree with anybody's
>ideas.
>
>My above points fulfill my burden of proof.  I don't know what else to say.  To

I don't agree. As far as I can see, you have made a case that there are
potential difficulties in matching the rating scales between 2 different pools
of players. In my book, this does not constitute "fulfilling the burden of
proof" that the top computers are over rated.

>my knowledge I have addressed every major point both you and those whose
>opinions you agree with have raised in an attempt to support your position that
>comp-comp ratings (SSDF) have a bearing on comp-human (FIDE) ratings for comp
>programs.
>
>If you wish to debate or discuss further, it would be my pleasure.  Please,
>however, leave out the personal when waxing philosophical, as will I, as much as
>possible.  I promise to tone it down.  And please address *others points* not
>just repeat your own or those of others you agree with--we have heard them all
>many times before, ad nauseum.
>
>I don't expect to win you over.  I don't expect to 'win' any debate.  That is
>not the real enjoyment of this forum--winning an argument.  I do believe it
>would be loads of fun to discuss new points, analyze new evidence (example the
>Rebel match results), and explore the wonderful world of computer chess!
>Yes--that includes the rating controversy!  :)
>
>Take care, and once again, here's to a year of interesting postings--cheers!
>
>--Steve
>
>
>>>>>You are very good a taunting.  Ever think of being an attorney?  You would be
>>>>>ecstatic at cross-examining a hostile witness when the judge gives you a rather
>>>>>free hand.
>>>>
>>>>May I return the favour
>>>
>>>Yes, by all means (well maybe *not* by ALL means!).
>>>
>>>and offer you some career advice as well, Stephen? You'd
>>>>make a good comedy script writer.
>>>
>>>I should think so.  I get enough practice reading and writing them here.  :)
>>
>>This is another notion that requires evidence and proof. If you wrote comedy
>>scripts, would the listeners tune in week after week, or would they switch off
>>after the 1st episode?
>
>Point 1--true.

Not to worry - all great comdedians have "died a death" early in their careers!
One just has to persist, and keep improving (like the chess programs! :)  ).

-g

>Question 1--now you are getting the point!
>
>[I myself am bored to tears with banal and repetitive postings that add little
>to the body already published.  NOTE--This is not resentment directed at new
>posters, nor at those relatively new who haven't been through the same
>discussions of yore.  Nor is it a lack of respectful tolerance for continual
>polishing of old topics and ideas, as we learn to communicate ideas in better
>fashion, or interpret them anew in light of new thoughts or evidence.]
>
>--Steve Boak
>
>>-g
>
>>>--Steve
>>>>>--Steve
>>>>-g



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.