Author: Dann Corbit
Date: 22:43:29 03/04/05
Go up one level in this thread
On March 04, 2005 at 22:58:40, David H. McClain wrote: >On March 04, 2005 at 15:51:55, Dann Corbit wrote: > >>Here is what I do: >>1. I run the cleaner in SCID, which removes lots of duplicates. >>2. I run pgn-extract by Barnes with -dnul to get rid of more duplicates. It >>almost always finds some. >>3. I run ChessAssistant's duplicate finder against the data and keep >>"Essential" while removing "Discarable" from the set. >> >>I run one more cycle of the three steps above. >> >>After that, I do not find more duplicates. >> >>I expect that ChessBase will be similar to ChessAssistant. >> >>Finding duplicates is a very difficult thing to do, when you think about it. >> >>Two different players could play exactly the same moves -- especially in a short >>game. >> >>Bobby Fisher might be spelled: >>Bobby Fischer >>Robert Fischer >>Robert J. Fischer >>B. Fischer >>Fischer >>R. J. Fischer >>etc. >> >>To complicate things, chess programmers sometimes name their creations after >>famous chess players. >> >>It is also inevitiable that some of the duplicates thrown away will not really >>be duplicates. > >Dan, > >Thank you. I guess I'm trying to split hairs with this and catch or save every >last game. I was referring to only machine games so I guess the possibilites of >incorrect names is greatly decreased. I should have mentioned that. For >creating and editing to fine tune an opening book, I suppose a few lost games or >a few duplicates won't make much difference as long as the games have their >integrity in a data base that is small (~100,000 games) by today's standards. >Since a much more experienced person than myself states similar difficulties, >perhaps I should relax a bit! DHM You will find lots and lots of name errors im machine generated games, even by the most careful contest operators.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.