Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Removing duplicate games from PGN databases

Author: Dann Corbit

Date: 14:45:14 02/13/01

Go up one level in this thread


On February 13, 2001 at 17:36:23, John Hatcher wrote:

>Is anyone aware of a freely available program/utility that can remove duplicate
>games from a large PGN database? (i.e., 500,000 games)
>
>Chessbase Lite is limited to 8,000 games per database, and Rebel Decade 3.0
>requires the PGN database to be converted to Rebel's *.dat format before it can
>remove dupes (and I'm none too confident that Rebel is flexibile enough to find
>all dupes).
>
>I'd like to remove the dupes directly from the PGN file.

SCID can do what you want, but it won't operate directly on the PGN and it is
not for the timid.  On the other hand, it does a very good job of finding dups.
I think Chess Assistant is the most effective duplicate finder I have tried.

What you may really want is EXTRACT by David Barnes.  You can find a copy of it
here:
ftp://cap.connx.com/pub/chess-engines/new-approach/EXTRACT.EXE
ftp://cap.connx.com/pub/chess-engines/new-approach/EXTRACT.ZIP

The zip file has source code which you could probably care less about, but it
also has the eco classification file which you may need for some of your
operations and also the directions.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.