Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Finding Duplicate Games: SCID vs. Chessbase

Author: Dann Corbit

Date: 12:51:55 03/04/05

Go up one level in this thread


On March 04, 2005 at 15:40:36, David H. McClain wrote:

>Is one any better than the other for finding duplicate games?  I have found
>SCID, while a very nice program, continually finds a few duplicate games, no
>matter how they are sorted, until finally the search for duplicates is satisfied
>that there are no more.  Am I missing somehing with SCID?  DHM

Here is what I do:
1.  I run the cleaner in SCID, which removes lots of duplicates.
2.  I run pgn-extract by Barnes with -dnul to get rid of more duplicates.  It
almost always finds some.
3.  I run ChessAssistant's duplicate finder against the data and keep
"Essential" while removing "Discarable" from the set.

I run one more cycle of the three steps above.

After that, I do not find more duplicates.

I expect that ChessBase will be similar to ChessAssistant.

Finding duplicates is a very difficult thing to do, when you think about it.

Two different players could play exactly the same moves -- especially in a short
game.

Bobby Fisher might be spelled:
Bobby Fischer
Robert Fischer
Robert J. Fischer
B. Fischer
Fischer
R. J. Fischer
etc.

To complicate things, chess programmers sometimes name their creations after
famous chess players.

It is also inevitiable that some of the duplicates thrown away will not really
be duplicates.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.