Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Show it's predictive power.

Author: Charles Unruh

Date: 17:36:11 12/10/99

Go up one level in this thread


On December 10, 1999 at 19:04:11, Bertil Eklund wrote:

>On December 10, 1999 at 15:24:45, Charles Unruh wrote:
>
>>On December 10, 1999 at 14:49:07, Dann Corbit wrote:
>>
>>>On December 10, 1999 at 14:34:22, Charles Unruh wrote:
>>>>On December 10, 1999 at 13:24:32, Roger wrote:
>>>[snip]
>>>>>Recalibration is inevitable, because GM-computer games will become much more
>>>>>common.
>>>>
>>>>
>>>>Well not inevitable as no one is currently doing it, but hopefully it will get
>>>>done soon, we may already have enough rebel games to start with, if not, then
>>>>with a few more games with a few willing masters on ICC and we'll have enough to
>>>>rate that program.  As for Junior, I suspect there are enough games if not then
>>>>close. And a few games on ICC or FICS for that matter and it would be a good go
>>>>to start recalibrating the ssdf, to finally go ahead and prove that i was right
>>>>all along(Heck that ICC was right based on the ICC poll we had cincerning GM
>>>>strength.  Most people thought Comps were GM strength!) :).
>>>
>>>Rebel's GM challenge is an example where human players of known strength are
>>>pitted against a computer under tournament conditions and time control with
>>>money at stake.
>>>
>>>So you are mistaken that nobody is doing it.
>>
>>No I am not mistaken.  I said that nobody was recalibrating the ssdf.  I said
>>that with these results the ssdf could be recalibrated, which was the point of
>>me startying the thread!!
>
>Ok yoy are right, and totally wrong about the SSDF-list is (was) calibrated
>against human players during tournament conditions and not in single-game
>matches

I never said that it was.  All i said is that we could come up with a way to
rate some programs, in this case Rebel, and use that too recalibrate the ssdf
ratings.

with double increment time-controls. It´s like comparing a 1500 m runner
>against a participant in a hepathlon contest in his last branch (sorry I don´t
>know the name). I guess it is like me a lot of people who have participiate in
>tournaments in a another town (country) and remember who they feeled in the last
>round after a lot of dining and wining. Compare these results against the weekly
>play in your club. At least a computer can!

I'm not sure what you are trying to say.
>
>Bertil SSDF
>>  However, it will require that the
>>>SSDF use hardware equivalent to what Ed is using for some tests in order to
>>>understand how the tests relate to one another.
>>
>>I do believe for total accuracy we would need to of course test against the
>>rebel on the hardware that was used.  Though, i think their could be some
>>calculation done with some relative closeness of a howmuch strength rebel might
>>lose in strength on the lesser hardware though this would certainly not be the
>>most reccomended thing to do.
>>
>>Also, it is a single program
>>>and there are not a lot of games.  But no one besides Ed seems to be ready to
>>>put their money where their mouth is.
>>
>>Well that's fine that it's a single program(though i did mention that Junior
>>seems to have a good number of games too),  If i was 1600 and played a 2200 20
>>games and lost every game then that wouldn't tell me much about my strength.
>>however if i was 2160 strength and played a 2200, losing and drawing some in a
>>20 game match vs a 2200 that would tell me something about my strength.  Since
>>the progs follow this second scenario of being relatively close in strength 20
>>games against a single program say H7 vs Rebel(with the calculated rating from
>>human games), that would tell me something( at least something more) about the
>>strength of H7.  It of course would be more Ideal to have at least 3 progs
>>tested  vs humans.  I suggested a starting point of Junior and Rebel.  H7 may
>>have some games on record i don't know of them.  I picked Rebel, because the
>>games are in a series, and aren't being artificially selected out.  Meaning
>>that, I could probably round up 6 games of H6, but that would be artificial,
>>because i could pick all wins, which wouldn't make sense to do.  Though i do
>>think it would be ok to say add together 2 tournaments.  Say H7 played in
>>tournament "A" and Tournament "B"  then as long as i included all games i feel
>>that it would be satisfactory to add "A"+"B" to calculate a rating.  As i was
>>saying I think Junior may have a record that fits the just mentioned scenario.
>>>
>>>I don't think that ICC or FICS games should be compared to try and calibrate a
>>>rating.  I doubt if the GM's are very serious most of the time and expect that
>>>they are using the computers as sounding boards to test strategies for tactical
>>>weaknesses.  If I were a GM, that's what I would use them for.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.