Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Finding Duplicate Games: SCID vs. Chessbase

Author: Dann Corbit

Date: 22:43:29 03/04/05

Go up one level in this thread


On March 04, 2005 at 22:58:40, David H. McClain wrote:

>On March 04, 2005 at 15:51:55, Dann Corbit wrote:
>
>>Here is what I do:
>>1.  I run the cleaner in SCID, which removes lots of duplicates.
>>2.  I run pgn-extract by Barnes with -dnul to get rid of more duplicates.  It
>>almost always finds some.
>>3.  I run ChessAssistant's duplicate finder against the data and keep
>>"Essential" while removing "Discarable" from the set.
>>
>>I run one more cycle of the three steps above.
>>
>>After that, I do not find more duplicates.
>>
>>I expect that ChessBase will be similar to ChessAssistant.
>>
>>Finding duplicates is a very difficult thing to do, when you think about it.
>>
>>Two different players could play exactly the same moves -- especially in a short
>>game.
>>
>>Bobby Fisher might be spelled:
>>Bobby Fischer
>>Robert Fischer
>>Robert J. Fischer
>>B. Fischer
>>Fischer
>>R. J. Fischer
>>etc.
>>
>>To complicate things, chess programmers sometimes name their creations after
>>famous chess players.
>>
>>It is also inevitiable that some of the duplicates thrown away will not really
>>be duplicates.
>
>Dan,
>
>Thank you.  I guess I'm trying to split hairs with this and catch or save every
>last game.  I was referring to only machine games so I guess the possibilites of
>incorrect names is greatly decreased.  I should have mentioned that.  For
>creating and editing to fine tune an opening book, I suppose a few lost games or
>a few duplicates won't make much difference as long as the games have their
>integrity in a data base that is small (~100,000 games) by today's standards.
>Since a much more experienced person than myself states similar difficulties,
>perhaps I should relax a bit!  DHM

You will find lots and lots of name errors im machine generated games, even by
the most careful contest operators.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.