Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Educated guess needed!

Author: Don Dailey

Date: 14:09:15 12/09/98

Go up one level in this thread


On December 09, 1998 at 15:07:05, John Coffey wrote:

>On December 09, 1998 at 06:36:02, Mark Young wrote:
>
>>Many of use have played over the games of Deep Blue Vs GM Kasparov , and Rebel
>>10 Vs GM Anand.
>>
>>My question is what do you think would be the stronger chess program, and by how
>>much:
>>
>>Deep Blue, or Rebel 10 (K6 450Mhz) * 1000
>
>
>My understanding is the Deep Blue is designed to be fast as possible, so we can
>assume that its evaluation function is relatively simple.  But slower programs
>often have to have a more complex evaluation to make up for the lack of speed.
>
>If by some miracle Rebel 10 could examine as many nodes as Deep Blue then it
>would probably win.  This is not ridiculous nor impossible nor would it take
>a hundred years to happen (maybe.)  If you look at how computers have gotten
>a thousand times or more faster over the last 20 years, then is it possible
>that they could get a thousand times faster over the next 20 years?  We don't
>know the answer yet because we don't know if we will hit theoretical limits.  If
>we do hit such limits then will we be able to find ways around them?
>
>Hang onto that Rebel 10 program.  I will be curious just how it plays 20 years
>from now.  I wonder if we will still have DOS 20 years from now?  (Or Windows?)
>Both will probably require some sort of emulator to run.
>
>John Coffey

Hi John,

Deep Blue does not have a "simple" evaluation function by any
means.  It probably easily has as many terms as any micro,
maybe more that any or most micro's.  But term count doesn't mean a
whole lot when you don't know exactly what those terms
are and what they do.  My program for instance has 202 terms
in it, not very many but it covers a whole lot of ground.
6 of these terms are basic piece values and a few more are
terms that modify these values depending on the situation.

But it is very simple to add terms to your program.  The only
question is how does this affect the strength of your program?
If all I had to do was add terms to make the program stronger,
I would gleefully add hundreds of thousands of terms.

The reason I bring this up, is that several months ago, someone
used Deep Blue's term count as evidence for how strong it must
be.  I don't remember or care who said this, but at the time
I realized this is a horribly poor way to measure chess strength.
As far as evaluation is concerned, first order business is how
well the weights of the terms are adjusted, followed closely by
how well the terms are chosen (or what they actually measure.)
The least important factor is the actual number of terms used.
You can of course play tradeoff games, 1000 weak terms might
very well be equivalent to 100 solid well chosen terms.

But it is NEVER bad to have extra terms (assuming you don't care
about the slowdown which of course is not an issue with Deep blue)
the only issue is how much good are they actually doing.
The fact that Deep Blue has a huge number of terms is some
evidence that it has a high quality evaluation, I believe it
probably does have a very good evaluation.  But it is not
proof by any means.

About the issue of how much speedup is necessary to (Rebel for
instance on a 450 mhz) equal the current Deep Blue, it's a
pretty open question.  I have given my educated guess of at
least 5x, but no more than 100x.   This range reflects that
no one really knows how good Deep Blues is.  But we can
certainly make an educated guess and this is so much fun to
do that it goes around on this group every few months.  Part
of my "guess" is based on the fact that the version
of Deep Blue that last played against other micro's lost
a game to a 90 MHZ pentium and drew an engame against a
slightly faster pentium.  You cannot accurately measure
chess strength based on a sample of games this tiny, but
what you can do is start to build a reasonable upper bound.
Deep Blue has been modified since then of course and is much
stronger.  But the micro's that
competed in this tournament are also a factor of 4x or so
better, they would finish near the bottom today in a similar
tournament without more up to date hardware.  This is not
to mention that the software is a lot better too.

It was certainly a fluke that Deep Blue didn't do a lot
better at this tournament, clearly it was best by a
significant margin and was unlucky.  But what this
tournament revealed, in my opinion, was that the program
at that time was not likely to be more than a couple of hundred
rating points stronger than the other good programs of that
day. It would be unlikely to score this poorly if
you were to assume it was at least 300 rating points
stronger than the BEST entry at the tournament.  300 points
better than the AVERAGE entry, certainly.

Having said this, it's still just a lot of guesswork. It's
possible that Deep Blue was hundreds of points better than
Rebel and was the victim of an incredibly rare statistical
anomoly.   But throughout it's illustrious history, although
it has clearly dominated everyone else,  it will from time
to time take a loss or a draw from a micro.  It's not unbeatable
and we don't have to talk these ridiculous numbers (3 orders of
magnitude, 1000x?) to get equivalence to powerful modern day
programs.

I would also like to suggest that when you talk 100X hardware
improvement you are talking about 100X more memory too.  If
you speed up Rebel 1000X  the extra memory will be critically
important to it's performance.


- Don



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.