Author: David H. McClain
Date: 19:58:40 03/04/05
Go up one level in this thread
On March 04, 2005 at 15:51:55, Dann Corbit wrote: >Here is what I do: >1. I run the cleaner in SCID, which removes lots of duplicates. >2. I run pgn-extract by Barnes with -dnul to get rid of more duplicates. It >almost always finds some. >3. I run ChessAssistant's duplicate finder against the data and keep >"Essential" while removing "Discarable" from the set. > >I run one more cycle of the three steps above. > >After that, I do not find more duplicates. > >I expect that ChessBase will be similar to ChessAssistant. > >Finding duplicates is a very difficult thing to do, when you think about it. > >Two different players could play exactly the same moves -- especially in a short >game. > >Bobby Fisher might be spelled: >Bobby Fischer >Robert Fischer >Robert J. Fischer >B. Fischer >Fischer >R. J. Fischer >etc. > >To complicate things, chess programmers sometimes name their creations after >famous chess players. > >It is also inevitiable that some of the duplicates thrown away will not really >be duplicates. Dan, Thank you. I guess I'm trying to split hairs with this and catch or save every last game. I was referring to only machine games so I guess the possibilites of incorrect names is greatly decreased. I should have mentioned that. For creating and editing to fine tune an opening book, I suppose a few lost games or a few duplicates won't make much difference as long as the games have their integrity in a data base that is small (~100,000 games) by today's standards. Since a much more experienced person than myself states similar difficulties, perhaps I should relax a bit! DHM
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.