Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Calculating Computer Ratings????

Author: Robert Hyatt

Date: 07:31:40 08/04/98

Go up one level in this thread


On August 03, 1998 at 15:39:47, Shaun Graham wrote:

>
>>
>>However, the "standard" still applies... If you *assume* all programs are 2400
>>and one is a "killer"...  then *that* program is going to be 2800+ very quickly,
>
>
>Robert i think you are just trying to argue again:), because if there was a
>program so "killer" that it could defeat all top programs all the time, then it
>would be 2800! Anand has just demonstrated, that even he can't beat all top
>programs all the time!  Just say that we put Deep Blue in the mix, i doubt that
>even it would be 100%, probably not even 90%, but it would probably end up bieng

Not at all.  This has happened since the beginning of computer chess.  Go back
to look at "tech", chess 4.x, L'xentrique, and so forth.  Today, I'd lump Fritz
in the "computer-killer" category.  I personally don't believe it plays as well
against humans as other programs, because it is *very* susceptible to anti-
computer play, probably caused by its limited evaluation code to produce the
very high speed it reaches.

Anand showed that Fritz isn't a 2800 player, or even a 2600 player (its TPR
was actually 2400+2600/2=2500, but this is generous because it is likely that
Anand could have won the last game had he wanted to expend the effort.)

That's a problem, because a program to beat other programs is not necessarily
the same as a program to beat GM players, because I don't believe that "just
fast" is enough to take on GM players, although it can obviosly be plenty of
trouble for programs that are smart but too slow to manage the tactics.  Fritz
doesn't play bad.  But it's "knowledge" is nowhere near IM-level, it is simply
playing "sound" looking moves, that lead to active-type positions where its
tactics excel.

>2690 or even 2700, and that's because it is(to say it isn't is literally an
>insult to kasparov)!!  In test trials of doing just what i have said the other
>programs are all in the 2450-2550 rating range.  None of them are as you put it
>"killer" for the reason that a 2800 program doesn't exist(unless it's deep
>blue:)).
>
>>solely because it can toast a program that seems to be 2400.  Starting at 2400
>>is wrong...  it should use the normal "performance rating" approach for the
>>first 24 games to get a good starting point.
>
>The formula that i have proposed is based on the  provisional formula, but as i
>said earlier and someone else pointed out, with  computers it makes no sense to
>stop using this formula after 24 games, because then you would be in a situation
>again of getting ratings incrementally, which does not work well with computers.
> And there is no good objection to not starting the programs at an ELO of 2400,
>for the reason that we know that they are at LEAST this strength.  In fact in
>USCF and FIDE rules, if a tournament director knows the strength of an unrated
>player he at his discretion can assign a rating of strength(an ELO rating).  If
>that rating is less than the actual strength of the player, then his performance
>from that point on will push the rating to what it should be.  The only thing
>one might find from what you are saying is that the programs are even stronger
>than 2400, which would be born out anyway if you started them at 2400 after they
>stared winning games.  But by starting them out at 2400 you avoid the
>questioning of the ratings bieng too high.
>



your approach has a good and a bad side.  For a commercial program, "TPR" is
a perfectly acceptable way of calculating a rating, because the program doesn't
learn (other than book stuff) and it doesn't change.  For a research program
(including Crafty, Ferret, ZarkovX, WchessX, and so forth, ie programs that are
changing *daily* TPR won't cut it, and that is why it is only used for the first
24 games.  IE suppose I start playing and lose the first 24 games.  And then I
find the problem and win the next 10.  My rating will change very slowly if it
is only based on TPR, when the *real* rating of the program has obviously risen
sharply...


> And even then, the ratings won't
>>be comparable to FIDE because the games aren't played in that rating pool.
>>
>>
>>The main point is that computer vs computer is far different from computer vs
>>human.  Small differences in a single program can produce lopsided match results
>>when the two programs are equal everywhere else.
>
>Yes between two programs, but not between 7 or 8 programs. this lopsidedness is
>canceled out.


No it isn't, because any GM will tell you that playing a computer is *far*
different from playing a human GM.  And as a computer programmer, I can watch
a game and figure out whether I am playing a GM or a computer without having to
be told.  Because the computers (all of them) still make the occasional
"computer-move" without making tactical mistakes.  I gave you a simple
experiment to run:  get any source-available program, and play a match
(long match) between identical versions, to get a feel for the statistical
variability you will encounter.  Then make a minor change (ie reduce the value
of a pawn to .7 or increase it to 1.5) and play the match again.  If your change
was good, the match results will swing much more than that simple change would
be expected to produce.  If your change was bad, the changed program will do
much worse than expected.

*then* play both versions for a couple of hundred games on ICC, and I'll bet
their ratings are going to end up very close.  I have a current version of
Crafty that is playing very badly against computers.  Been going on for over
a week, and I haven't had a chance to search for the problem yet.  But it has
still been hanging around 2950 or so, and hardly *ever* losing against humans
on ICC, even though it has lost nearly every computer game it has played.  That
is a simple example..  whatever the bug is, it is a significant difference
between *my* program and the others that are playing there.  But it is really
inconsequential when playing humans, for some reason...





>>
>>
>>
>>>>
>>>>The only way to get reasonable Elo ratings for programs is to play them against
>>>>humans, and not against each other.
>>>
>>>I know that this type of statement seems intuitively correct.   The fault that i
>>>find whith it is this firstly, in testing for a rating it is best to test
>>>against multiple opponents, and all opponents will not spot the weakness, or
>>>necessarily take advantage of it in the same capable way.  The reason that you
>>>can get a relatively good rating is because you are testing against multiple
>>>styles of play from different opponents.  This is slightly conceptually
>>>difficult to understand so i will provide an example.  " I have an opponent he
>>>is rated lower than me, but this particular opponent has a style that causes me
>>>particular difficulty and he beats me almost all the time.  Yet i perform in
>>>tournaments to a degree, he has never come close to.  So he is is like your
>>>computer that always spots the weakness, but my play against multiple opponents
>>>counteracts this effect on my rating.  If however you have a program that loses
>>>to all (or almost all)opponents, regardles of the reason you give, ultimately
>>>that program lost because it is weaker thus its rating will drop.  Human players
>>>of comparable streak to the winning computer would take advantage of the
>>>weakness quite likely as well.
>>
>>
>>your "weakness" isn't a weakness.  It is based on statistics and if you play
>>a pool over and over, and your friend plays the same pool of players, your
>>ratings are going to stabilize at points that reflect your skills against
>>that pool of players.
>
>What you are saying misses the point, we have played in the same pool over and
>over and i'm 200 points higher rated, yet he beats me. Why does he beat me?  The
>answer as best i can figure it, is simply that he as a unique combination of
>just happening to be strongest at playing against the lines that i play, and his
>strengths simply happen to be were my weaknesses are.  That conan doyle axiom,
>when you have examined all other possibilities, whatever is left must be the
>truth.  We have both played a significant number of games against each other,
>and a significant number of tournament games against the larger pool.  He beats
>me, but can't perform against the pool as well as i can.  Amongst chessplayers
>this sort of thing is not a rarety. Fischer once spoke of Tal "He's beaten me 4
>times in a row, but i still say he plays unsound chess"(Gellers name could have
>even been substituted in that quote).  So the result is overall i am the better
>player(against the pool), but when the pool is reduced to just he and I, it
>appears he is the better player.  Counterintuitive certainly, but that's the way


I had the same situation in 1970.  My first official USCF rating (provisional)
was 2258 after 17 games...  because there was a guy in the Birmingham Chess
Club that was a 2200 player, yet he could not cope with my wild attacking type
of chess.  I played him 3 times in tournaments here and won 2 and drew 1.  So
it can happen, but it is obviously a statistical anomaly...

But remember, this *is* statistics, and *not* certainty...  so that kind of
situation is *guaranteed*, and not totally unexpected...





>it is.  It's sort of as if he has "anti-shaun" programming, it works well
>against shaun, but not the rest of the pool.  Further considering that i'm the
>current Reserve state champion, and also the former blitz champion it doesn't
>have anything to do with me bieng imconsistant or unlucky.
>
>It does *not* suggest that the two of you are going to
>>have a specific game-outcome with 100% reliability... it just says that after
>>both have played that large pool of players, your probability of winning against
>>them is X, while his is Y.
>
>Yes i may have a probability of scoring x against the pool and he has a
>probability of y, but that does not necessarily and in this case obviously
>doesn't have much to do with his probability against me.


this probably means neither of you play enough... because there are other
players that can do the same to either of you...  if you play enough to find
them...



>>
>>When you take your example, you are highlighting the very issue I bring up in
>>computer chess:  a program to beat other programs is quite a bit different from
>>a program to beat humans.  And when you play comp vs comp, you find the program
>>that is best suited to beat other programs (fritz, for example) while when
>>playing comp vs human, you will probably find that a *different* program will
>>maintain the highest rating...
>>
>This is a logical possibility, only if the pool of programs you are testing are
>considerably alike in their play.  However if against a pool of programs whith
>considerably different styles, if a single program performs well against this
>pool of multiple styles the best, then the likelyhood will be that it is the
>program that plays best against humans as well.  For the reason that it can deal
>with  multiple styles of play the best.  As far as what i have seen there is
>consdierable difference in the styles of programs(genius a overly defensive
>player) chessmaster(master of the initiative), Hiarcs( a bum program :)), Rebel
>( a proven solid mostly positional style) Fritz (a magician).
>
>



My evaluations:  Hiarcs.. smart, but too slow.  Fritz:  not smart but very
fast.  Rebel:  in the middle.  Which would I bet on in a match?  Easy.  Fritz.
Which would I bet on against humans, probably Hiarcs, with Rebel right behind
(or right ahead, as they would be close).  Both of these programs have enough
"smarts" to throw kinks into many anti-computer plans, while Fritz simply walks
right into them...





>>
>>
>>
>>>
>>> Computer vs Computer is a vastly different
>>>>game than computer vs human.  You can take any program, make a fairly serious
>>>>change to the eval, or to the search extensions, and not see much difference
>>>>against a pool of humans.
>>>
>>>If this is a weakness which has been induced such that it is always taken
>>>advanbtage of by computers it can be taken advantage of by quite a few humans as
>>>well, and if it isn't then the program is still stronger overall than the human
>>>which was played.
>>
>>
>>problem is  it can be a "tiny" weakness.  But if two programs know *everything*
>>that the other one knows, the one with one extra piece of knowledge (assuming it
>>is useful of course) has an advantage.  IE two trains on the same track heading
>>in opposite directions, one has 12,400 horsepower, the other has 12,401.  The
>>extra horsepower is going to eventually win the pulling contest.  While a
>>human probably couldn't tell the difference...
>
>This sort of analogy doesn't really work with chess, for the reason, that in
>every game, that 1 extra thing isn't going to always be applicable in the
>positions of the game.  Program "A" may have a slight tactical weakness, and
>plays the sicillian, and gets munched, the next game it plays, a closed Ruy, the
>effects of the tactical weakness wont necessarily be able to be exploited to the
>same degree or at all, and it may be stronger positionally and win the game.
>This is because sometimes having that 1 extra thing is a weakness, and sometimes
>it is a strength, it depends on the game.


I would never discount the opening book.  But that evens out eventually if a
program has learning.  Then all that is left is the differences in the engines,
and regardless of all the hype you read here, most programs are *very* similar.
Alpha/Beta, Hashing, some sort of endpoint evaluation, search extensions, and
so forth.  Some add null move and move further away from the others in terms of
simularity, but nothing says null-move is a clear winner (note that DB doesn't
use it, and still plays "pretty strong":)  )




>>
>>This makes version-a vs version-b testing *very* difficult.  Because it is
>>actually possible to write code that is worse, but which produces a better
>>match result against that old version.
>>
>>
>As i said it's pointless to simply test "A" and "B" you must test against many
>opponents.
>>>
>>>But a strong computer opponent will quite quickly
>>>>"home in" on such a problem and make you lose game after game.
>
>As i said different games require different strengths, so a program shouldn't
>lose game after game, unless ultimately it is much weaker against the larger
>pool as well.


About a year ago, we played a long series of matches, Crafty vs Fritz on FICS.
Crafty won nearly every game.  Each game was decided by passed pawns, except
for just a couple.  It became apparent that Crafty had some important (in those
games) bit of knowledge it was using, and Fritz didn't.  A result like that
would lead your rating system to conclude that I was fritz+400 in rating, yet,
in reality, there was no magic.  I since changed enough that this doesn't come
close to happening today, but it clearly shows that this *can* happen...

When you factor in the commercial authors continually auto-testing, note that
things *really* break down, because now it would seem that the last released
program is always better than the others...  And a program-program competition
would only exaggerate this and inflate ratings like mad...




>>>>
>>>>Ie in the famous first paper on singular extensions, Hsu and company reported a
>>>>really significant rating change, when comparing DT with SE to DT without.  They
>>>>later noticed that the difference was way over-exaggerated, because the only
>>>>difference between the two programs was SE.  Their last paper suggested that SE
>>>>was a much more modest improvement.
>>>
>>>I'm not certain who or what they were testing against to get the rating, but if
>>>the testing was done only against one or two opponents (which i strongly
>>>suspect), then this is were the error lies.  It's just like i mentioned the
>>>weaker player who always beats me, if you based his rating strictly on the games
>>>between us, his rating would be over 2250( hundreds of points stronger than his
>>>real strength).
>>>>
>>>>If I simply took crafty as 2,000 Elo, for version 1, and then played each
>>>>successive version against the previous one, and used the traditional Elo rating
>>>>calculation, I would now be somewhere around 5500+.  Because minor changes make
>>>>major differences in the results between A and B, yet do very little in A vs H,
>>>>where H is a human.
>>>
>>>This would be far from the case however if you placed it in a pool with all top
>>>programs.
>>
>>
>>Maybe, or maybe not.  Come on robert you know there is no way crafty would be 5500 against a pool of multiple opponents.

Didn't mean to suggest that... Just the simple fact that that is *exactly*
what happens in the SSDF list, because old versions of X play new versions of
X, and get their clocks cleaned.  In fact, the newest version of X is almost
guaranteed to beat older versions of *all* programs because of the testing
done prior to releasing them.

And that gives corrupted statistics...  and that is why playing in a human
"pool" would be better...





>
>Because it might do well in that pool, but get swamped
>>by a group of strong humans.  Or it might do badly in *that* pool, but swamp
>>a group of humans that would wipe that "electronic" group clean.  The issue of
>>a "rating" is foreign to what is being done so far.  The only way to get a
>>rating (estimate of outcome vs players in a known rating pool) is to play in
>>that pool.  Not in a comp-vs-comp pool that will almost guarantee a vastly
>>different rating order...
>
>There might be some difference, but i doubt there would be significant
>difference, once you have established that a set of multiple programs(different
>styles) are at least human 2400 elo.  This is for the reason that usually when
>you seperate two pools the results don't become immediately skewed but rather
>begin to skew as the two populations change.  However because computers do not
>change as do other sorts of types of pools(of comparative things)  you simply
>will not have an increasing divergence.  Since this divergence is considerably
>eliminated the pools, them selves aren't really that much different from each
>other.


just visit the SSDF site.  And look at the record of a new program vs all the
old programs.. you find plenty of data to show where this will screw up.  The
idea *might* work, if the programs *never* played each other before they met
in your "match".  But this doesn't happen, because of auto-testing...




>>
>>It will certainly predict the outcome for the comp-vs-comp games... but no one
>>uses the number like that... they try to extrapolate the results of comp vs
>>human games, based on the rating obtained in comp vs comp games.  And it won't
>>work, ever...
>>
>>
>
>It of course would be best if you could have many test games between humans, but
> because once the rating has been established that the programs are Human
>Elo(preferably 2400).  The pools will be significantly similar because the
>computers will not diverge(in other words a computer 2400 will not change)
>Though it is forseeable that in the human pool the meaning of a human 2400 could
>possibly change, though i suspect this is unlikely.  Because this is a unique
>situation were the possibility of significant divergence is limited, it makes
>testing between computers comparable with games between humans.  Currently the
>only problem is(at least with the ssdf) is that they didn't start with a known
>comparative value, I.E. a 2400 elo rating for a computer.
>>


I suspect that if you play computers vs humans, you will find wild variance
in the results, just as we see on ICC.  And I also suspect that you would see
a computer's rating *steadily* decline as humans learn what it can and can't
do and adjust to it.





>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Because programs only learn to avoid certain lines, they really
>>>>>>don't learn like humans anyway so no rating system will make their ratings like
>>>>>>human ratings. Besides the SSDF list is only good for comparative purposes.
>>>>>
>>>>>That's the problem it's not good for comparative purposes, i wish it was i'm
>>>>>sure you have seen my disccusions on here demonstrating how Fritz is GM strength
>>>>> (which it is). However ,apparently it's difficult to show that using the
>>>>>current SSDF system, because OBVIOUSLY many people don't accept it.  If they did
>>>>>when i said Fritz is GM strength because it's elo is 2589, there would be no
>>>>>disagreement.
>>>>>
>>>>
>>>>
>>>>the problem is that SSDF has too much inbreeding in the ratings.  And no one
>>>>has ever taken the time, nor gone to the expense, to enter a computer into FIDE
>>>>tournaments (FIDE membership is possible, but finding a tournament that would
>>>>allow computers might be much more difficult).  So it is quite possible that
>>>>fritz, at 2580, is 400 points better than the fidelity mach III at 2180.  But
>>>>would that hold up in human events?  I doubt it.  I suspect Fritz would lose
>>>>more than expected, and the Mach III would win more than expected.  For the
>>>>reasons I gave above.
>>>>
>>>
>>>As for fritz it might do better or worse, than a 2580 ELO, it depends on lots of
>>>factors, such as tournament type, Fritz would do very well in a swiss system
>>>tournament.  It would still be 2500 ELO at the least i'm certain.  Playing in
>>>human tournaments would be quite beneficial.  Though i would always much prefer
>>>data from swiss events as compared to invitationals.
>>>>
>>>>
>>>>
>>>>>
>>>>>You
>>>>>>are attaching too much importance to the isolated rating number.
>>>>>
>>>>>No i'm not.  Ratings are all important, it's the only way to show the relative
>>>>>strength of computers to human strength.  Thus it is very important to isolate a
>>>>>VALID rating for a program firstly, so that you can no how computers really
>>>>>compare to humans, and secondly, how so that we can gauge exactly how far along
>>>>>the evolutionary tract programs are.
>>>>>
>>>>
>>>>
>>>>
>>>>correct, but not easily doable.  IE computer vs computer has *nothing* to do
>>>>with computers in the human-chess-tournament world.
>>>
>
>I disagree for reasons given above.


I have a computer that can most likely beat any program you care to supply in
a match.  It searches about 10M nodes per second, and in your pool, I'd be
surprised if it didn't win well over 3 of every 4 games it played.  Yet I
can guarantee you that in a human tournament, this "monster" and a program
like "Rebel" (or even Crafty) would not be widely separated in TPR.  DB is
another example.  It's performance against programs would suggest that it is
at least +400 rating points better.  But when playing humans, its actual rating
would not be +400 I don't believe.




>
>
>>>Well i would agree that one computer vs a single other computer wouldn't mean
>>>much, but against multiple styles of programs, indeed i believe you can garner a
>>>rating of relatively strong reliability.
>>>
>>
>>
>>Stop and ask a good GM about computers and humans for opponents.  He'll give
>>you more info, quicker, than I can.  But the ones I know can (a) tell almost
>>immediately (within a game or two) if they are playing computers and (b) will
>>alter their style knowing this.
>>
>
>???
>>
>>
>>> Because it is all about
>>>>statistics, and given two different "pools" of players, the absoluate ratings
>>>>would vary significantly, and the spread would vary as well, because the
>>>>expected outcome of computer vs computer is different than computer vs human.
>>>
>>>This statistical outcome is only different because of the point that this thread
>>>is making, that point bieng the current rating system of calculating computer
>>>ratings incrementally like humans isn't accurate for computer usage.  I believe
>>>the procedure i have outlined is a fair degree more accurate and would be a
>>>right step in making the ratings of computers have the same statistical effect
>>>against human populations.  There is a problem though of human bias that i wont
>>>go into to much detail about though, Such as the fact that i can beat CM 4000
>>>turbo a higher prcentage of the time than average, because i beat it once, and i
>>> can often repeat the same game, there is some possibility of this in tournament
>>>play for computers.  And anti computer chess as opposed to regular chess play,
>>>though i'm starting to suspect that for the non-grandmaster attempts at
>>>anticomputer chess will garner more losses than wins.
>>
>>
>>At sub-IM levels, I agree.  But IM's are quite capable of using "anti-computer"
>>strategies and most are positionally and tactically strong enough to not get
>>in over their heads.  The danger is taking "anti-computer" to the level where
>>you end up in a position you don't quite understand, or worse...
>>
>Ask Dean Hergott.  Neither here nor there, as i hold the top programs are GM
>strength so it really doesn't matter what the I.M. does. GM Defirmian says in
>"how to play better chess" "If you are a GM you should be able to overpower the
>IM tactically.  The GM will often blow out the IM in this area"(pg 6).

GM Defirmian is full of "it" too.  There are more than a few IM's that are just
as good as most GM players, but have not had the time to earn the required norms
yet.  His statement is silly.


>Considering that tactics are almost universally argued to be the strong points
>of computers this is a considerable bolstering of there positions against I.Ms.
>No I.M. i can think of would have held that 40/2 match that occurred between
>rebel10 and Anand.


I can think of several that would probably have done better, in fact, because
they are much more "up-to-date" on playing computers.  Martin Borriss is one
*very* strong computer-killer.  Brian Hartman is another (Canadian).  Orlov,
Commons, Kopec, Valvo, several come to mind.


>>
>>
>>>>
>>>>Fritz is ideally suited to play other computers.  Very fast, very deep.  But
>>>>I'd expect it to do worse than some other programs against a group of GM
>>>>players.  Anand was an example. Shredded Fritz trivially, had to work to beat
>>>>Rebel in the two slow games.  Yet I'm sure that fritz will beat Rebel in a
>>>>match, as has been seen on SSDF.
>>>
>>>Well the fritz didn't get to play any 40/2 games so i don't know how it would
>>>have done. I would though point out that i have read a quote of anand saying he
>>>plays fritz all the time.  When i play weaker opponents at the club, who for
>>>some reason don't want to change the way they are playing my win percentage
>>>increases, Anand had this advantage with fritz.
>>>>
>>>
>>>I'm personally begginning to think that Chessmaster tested on the new faster
>>>hardware is the strongest program, it has no optimized books, it isn't tuned
>>>against other programs, and yet it just beat rebel 7 to 2 in a 40/2 match on
>>>leubkes page, that's besides my own testing which bears out pretty much the same
>>>though not quite i'd think 7 to 2.
>>>
>>
>>
>>
>>If you go back to r.g.c.c a couple of years ago, I pointed out that of all the
>>programs I was playing on the chess servers, ChessMaster consistently gave me
>>the most problems.  It is good, and will continue to be good, IMHO...
>>
>This we can agree on, Chessmaster is benefitting hugely from the faster machines
>now(the 5000 engine was only tested on a P90).  I wouldn't be surprised by a #1
>ranking of 6000, though i think the strength really starts to kick in when
>tested on at least a P233.  Unfortunately,  the opening book sucks:(.
>

yes...  but it seems to have an active style, with decent moves, and very good
tactics...  Dangerous combination, IMHO.



>
>
>>
>>>>I'm more interested in the computer vs human games, but I do pay attention to
>>>>computer vs computer when possible...
>>>>
>>>>
>>>>> Ratings abhor a
>>>>>>vacuum. You need lots of competitors to have a good system and the SSDF is a
>>>>>>closed shop.
>>>>>
>>>>>No they are not a closed shop, as the data is readily available to be examined
>>>>>and calculated by anyone with the inclination.  They have no stranglehold on the
>>>>>knowledge of how to calculate ratings, and if you look at another of the follow
>>>>>ups to this post, you will find that the SSDF is in fact instituting a plan
>>>>>similar to the one the i have suggested(recalculating from scratch, not
>>>>>incrementally).
>>>>>
>>>>
>>>>but it has the same problem..  because someone will still assume that Elo 2500
>>>>is the same as SSDF 2500.  And it still won't be so, until the games come from
>>>>players in a *common population*...
>>>>
>>>No it's not a problem because on a corrected system a ssdf 2500 would be
>>>relatively equivalent to a ELO 2500.  Games within a common population would
>>>makeit even more accurate(more than likely), despite this though you can still
>>>come to a relatively accurate rating.
>>
>>
>>Here we just have to agree to disagree.  Elo is all about sampling theory and
>>probability analysis.  There is *no* way to normalize ratings between two
>>different sampling groups.  Other than to combine them and let them play in
>>the same pool.
>>
>>
>
>Yes you can if you can't find a significant difference between the groups.
>Especially if when they have been pooled at times, and there is no reason for
>the two groups to diverge in nature once seperatedinto two pools.

again, after you play lots of comp vs comp games, you will begin to see why
this actually does happen...  And note that the SSDF and FIDE *never* had a
common pool.  No computers have ever had a FIDE rating, so there's no valid
way to "seed" the SSDF.  Some of their numbers come from USCF ratings, some
from others, but after 5 years (or more) the ratings and strengths have diverged
so far as to be statistically meaningless...





>>>>
>>>>
>>>>
>>>>>
>>>>>Shaun



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.