Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Finding Duplicate Games: SCID vs. Chessbase

Author: David H. McClain

Date: 19:58:40 03/04/05

Go up one level in this thread


On March 04, 2005 at 15:51:55, Dann Corbit wrote:

>Here is what I do:
>1.  I run the cleaner in SCID, which removes lots of duplicates.
>2.  I run pgn-extract by Barnes with -dnul to get rid of more duplicates.  It
>almost always finds some.
>3.  I run ChessAssistant's duplicate finder against the data and keep
>"Essential" while removing "Discarable" from the set.
>
>I run one more cycle of the three steps above.
>
>After that, I do not find more duplicates.
>
>I expect that ChessBase will be similar to ChessAssistant.
>
>Finding duplicates is a very difficult thing to do, when you think about it.
>
>Two different players could play exactly the same moves -- especially in a short
>game.
>
>Bobby Fisher might be spelled:
>Bobby Fischer
>Robert Fischer
>Robert J. Fischer
>B. Fischer
>Fischer
>R. J. Fischer
>etc.
>
>To complicate things, chess programmers sometimes name their creations after
>famous chess players.
>
>It is also inevitiable that some of the duplicates thrown away will not really
>be duplicates.

Dan,

Thank you.  I guess I'm trying to split hairs with this and catch or save every
last game.  I was referring to only machine games so I guess the possibilites of
incorrect names is greatly decreased.  I should have mentioned that.  For
creating and editing to fine tune an opening book, I suppose a few lost games or
a few duplicates won't make much difference as long as the games have their
integrity in a data base that is small (~100,000 games) by today's standards.
Since a much more experienced person than myself states similar difficulties,
perhaps I should relax a bit!  DHM



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.