Author: Robert Hyatt
Date: 13:21:12 10/12/05
Go up one level in this thread
On October 12, 2005 at 15:05:52, Dieter Buerssner wrote:
>On October 12, 2005 at 14:33:55, Dann Corbit wrote:
>
>>On October 12, 2005 at 12:38:09, Salvo Spitaleri wrote:
>>
>>>On October 12, 2005 at 12:31:03, Dann Corbit wrote:
>>>
>>>>On October 12, 2005 at 12:25:41, Salvo Spitaleri wrote:
>>>>
>>>>>Hello friends,
>>>>>
>>>>>How to split huge pgn files, in smaller files by name of the players?
>>>>>
>>>>>Can PGN-extract do it?
>>>>
>>>>Yes.
>>>>
>>>>So can SCID and many others too, I imagine.
>>>
>>>Hi Dann,
>>>
>>>I mean in a automatic way for all the players in the file.
>>>SCID can do it only for one player to the time!
>>
>>Probably PGN Extract is better.
>>
>>1. Grep for player names with the "[White " and "[Black " tags
>>2. Create a sorted unique list of players from the tags
>>3. Use PGN-Extract to filter into groups from the players list.
>>
>>It goes without saying that the games will have lots of duplicates when filtered
>>in this way (e.g. Fischer verses Karpov will show up in the Fischer file and in
>>the Karpov file).
>
>Dann, this certainly looks like excellent advice. But it seems to need quite
>some work between the steps, that almost needs a programmer. I did not try, but
>I guess the grep needs some escape for the "[". The result of the grep will need
>some trimming (getting rid of quotation, [], White, Black). This seems the most
>difficult part to me.
If you don't care about regular expressions, you could use "fgrep" which would
not treat the "[" in any special way at all.
awk makes it trivial to take something like
[White "Hyatt, Robert"]
and turn it into Hyatt, Robert as in
awk -F\" '{print $2}' which will get rid of the [White " and then the trailing
"]
parts...
>
>I don't know PGN-Extract well enough, to judge how well it would work here. I'd
>fear, it would take many passes over the original PGN, that it could be just too
>slow on a large PGN (say your "junkbase" with something around 3 GB).
>
>It would produce probably really many files, not all file systems would be able
>to handle this. Another complication might be "special" (non English) letters
>inside the names (think of / or \).
>
>No doubt, all these problems will be solvable. Perhaps PGN-extract already
>handles the later mentioned issues well. However, it looks to me, as if it were
>no easy "end user task".
>
>I would probably try to use my PGN-parser and write it in C. I'd try to extract
>the two names of each game, open in append mode two files with trimmed names in
>append mode and write the game there. Then close the files again. When there is
>no file number limitation, this might work in an hour or so. No idea, if the
>large number of open/close would make the thing too slow for beeing useful for
>really large PGNs.
>
>Regards,
>Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.