Author: Dann Corbit
Date: 12:15:12 10/12/05
Go up one level in this thread
On October 12, 2005 at 15:05:52, Dieter Buerssner wrote: >On October 12, 2005 at 14:33:55, Dann Corbit wrote: > >>On October 12, 2005 at 12:38:09, Salvo Spitaleri wrote: >> >>>On October 12, 2005 at 12:31:03, Dann Corbit wrote: >>> >>>>On October 12, 2005 at 12:25:41, Salvo Spitaleri wrote: >>>> >>>>>Hello friends, >>>>> >>>>>How to split huge pgn files, in smaller files by name of the players? >>>>> >>>>>Can PGN-extract do it? >>>> >>>>Yes. >>>> >>>>So can SCID and many others too, I imagine. >>> >>>Hi Dann, >>> >>>I mean in a automatic way for all the players in the file. >>>SCID can do it only for one player to the time! >> >>Probably PGN Extract is better. >> >>1. Grep for player names with the "[White " and "[Black " tags >>2. Create a sorted unique list of players from the tags >>3. Use PGN-Extract to filter into groups from the players list. >> >>It goes without saying that the games will have lots of duplicates when filtered >>in this way (e.g. Fischer verses Karpov will show up in the Fischer file and in >>the Karpov file). > >Dann, this certainly looks like excellent advice. But it seems to need quite >some work between the steps, that almost needs a programmer. I did not try, but >I guess the grep needs some escape for the "[". The result of the grep will need >some trimming (getting rid of quotation, [], White, Black). This seems the most >difficult part to me. > >I don't know PGN-Extract well enough, to judge how well it would work here. I'd >fear, it would take many passes over the original PGN, that it could be just too >slow on a large PGN (say your "junkbase" with something around 3 GB). > >It would produce probably really many files, not all file systems would be able >to handle this. Another complication might be "special" (non English) letters >inside the names (think of / or \). > >No doubt, all these problems will be solvable. Perhaps PGN-extract already >handles the later mentioned issues well. However, it looks to me, as if it were >no easy "end user task". > >I would probably try to use my PGN-parser and write it in C. I'd try to extract >the two names of each game, open in append mode two files with trimmed names in >append mode and write the game there. Then close the files again. When there is >no file number limitation, this might work in an hour or so. No idea, if the >large number of open/close would make the thing too slow for beeing useful for >really large PGNs. You are right that several manual steps would be needed and I assumed some familiarity at least with editors and probably search/replace. I also expect it would have to run overnight.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.