Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Chess Tiger - Is It Really 2696 ELO?

Author: Robert Hyatt
Date: 12:05:57 12/23/99
On December 23, 1999 at 06:32:32, Graham Laight wrote:

>On December 22, 1999 at 21:48:40, Robert Hyatt wrote:
>
>>On December 22, 1999 at 19:03:34, Graham Laight wrote:
>>
>>>On December 22, 1999 at 15:07:42, Albert Silver wrote:
>>>
>>>>>At the end of the day, good chess is good chess. A machine that can beat more
>>>>>computers is also likely to beat more humans.
>>>>>
>>>>
>>>>That's really the core of the issue, and I don't agree with it. I used to, but
>>>>as I grew stronger in chess, I changed my mind. It isn't because I am way up
>>>>there, but because I can better appreciate the difference between myself and an
>>>>IM for example. The point is 80-90% of computer chess is dependent on tactics,
>>>
>>>As computers continiue to get stronger, strong chess players are going to have
>>>to accept that there's more than one way to play good chess. Daniel King
>>>suggested this in his book about the GK/DB 1997 rematch in New York.
>>>
>>>>and let's say up to a strength of 2100-2200, this is also very true for human
>>>>players, but then a new important factor comes in and the balance swings
>>>>completely. Most IMs and GMs rely on their positional play, and this weighs in
>>>>more and more as a rule the stronger they get. This is not the case of computer
>>>>programs. Not by a long shot. And since no program is sufficiently strong
>>>>positionally to properly compensate inferior tactics with superior positional
>>>>play, the tactical wizards consistently top the lists.
>>>
>>>This doesn't quite seem to add up to me. More and more frequently, we are
>>>reading about GMs succumbing to computers at tournament time controls. DB v GK
>>>was a good example. In the last Aegon tournament (1997), the computers beat the
>>>humans overall. If the limit of tactical strength has been reached by computers,
>>>and if computers do not have mastery of positional factors, then what's going
>>>on?
>>>
>>>I'm still not happy that I agree with yours and Bob's assertion that SSDF rate
>>>the computers too highly. It's true that there is a tendency for new programs to
>>>come in with very high Elo ratings, and then shrink back with the passage of
>>>time, but these guys are very experienced at what they're doing. They admit that
>>>there's a margin of error, but, over a long period of time, haven't they been
>>>around about the right order of magnitude with their ratings?
>>>
>>>If you don't believe that Tiger is significantly over 2600 Fide, then in the
>>>recent past, something has gone very wrong in the SSDF team.
>>
>>The problem is well-documented.  if one pool has nothing but monkeys, and
>>the other nothing but chess geniuses, you will still have 1200 humans
>>and monkeys, and you will have 2800 humans and monkeys. And the ratings
>>won't have a thing to do with each other.  Because there is no cross-
>>pollenation of the rating pools.
>
>In the case of the SSDF computer pool, much of it has been there for a long
>time, and is known to be broadly correct.


No it isn't, but I also don't have enough time to turn this into a "statistics
101 course" and explain why it doesn't work like that.  If you look at how the
Elo formula works, the _last_ game you played influences your rating _far_ more
than the game you played 40 games ago.  That is simply how the statistics work
here.  So even if the original SSDF programs were _perfectly_ calibrated to some
human rating scale, the fact that 10 years has elapsed means that the effect of
that calibration is _long gone_...



>
>And in the past, the evidence has supported SSDF - there just hasn't been much
>in the way of evidence in the last couple of years.
>
>If you regard your "human" rating to be about 2200, can you beat programs (other
>than your own, which you know too well) of a higher rating? If you believe that
>the computer ratings are about 200 Elo too high, you ought to be able to.


I can trounce the Fidelity Mach III for absolute certain.  And It was
rated at 2265 by the CCR, and just under 2200 on the SSDF when it was
first rated.  I certainly am not a 2400 player, yet I beat it as if I
am.

So yes, I think the ratings are inflated, with respect to FIDE.  I think
the ratings are fine considering that (a) the pool of players is computer
only, (b) there are lots of games among _all_ the pool players, and (c) the
_absolute_ rating means nothing anyway.  It is the _difference_ in rating
between two players that is important.  You could add or subtract 1000 from
all the SSDF ratings and nothing would change...





>
>>I have watched Tiger play.  It _absolutely_ is not a 2700 FIDE player.  Nor
>>is any other program IMHO.
>
>But can a GM guarantee to know what good chess looks like?
>
>A lot of GMs strongly criticised much of DB's play against GK - often using
>phrases like "that move was truly ugly", thus implying that to be a good move, a
>move has to "look attractive" - but in the end DB came away with the points.
>


There was lot of disagreement.  At one point DB played something like g5.  It
was roundly criticized.  Yet Kasparov said "that was the only move to try..."

beauty, it seems, is often in the eye of the beholder.  And yes, I believe that
some truly ugly moves are only ugly to humans, but are brilliant in light of
what the computer sees. :)






>What is wrong with the way Tiger plays? Can you describe to me the aspects of
>its play which have convinced you that it is not anywhere close to being a super
>GM, as its rating would imply?
>
>-g



It can't handle blocked positions, which means it is going to be very prone to
exploitation.  It can be fixed.  But it isn't there yet.  It has some problems
in particular types of endgames, the type that GMs 'sense' and when they find
out "hey, it will go into this if I give it a pawn up, not knowing that it has
no winning chances at all."  Of course, Tiger is not the only program with end-
game problems.  I don't know of a single one (mine included) that don't have
enough holes to drive a truck through.  But just watch it or any other program
play.  Tactically accurate.  But positionally weak at times.  Come to ICC and
watch players like 'vic11', 'cptnbluebear' and other GMs.  Watch their
positions.  Then watch the positions created by computers.  There is a vast
difference...
Re: Chess Tiger - Is It Really 2696 ELO? Amir Ban 16:30:48 12/23/99
- Re: Chess Tiger - Is It Really 2696 ELO? Robert Hyatt 18:01:12 12/23/99
  - Re: Chess Tiger - Is It Really 2696 ELO? Dave Gomboc 09:17:12 12/24/99
Re: Chess Tiger - Is It Really 2696 ELO? Chris Carson 12:14:21 12/23/99
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.