Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Search for a tool

Author: Dieter Buerssner

Date: 12:05:52 10/12/05

Go up one level in this thread


On October 12, 2005 at 14:33:55, Dann Corbit wrote:

>On October 12, 2005 at 12:38:09, Salvo Spitaleri wrote:
>
>>On October 12, 2005 at 12:31:03, Dann Corbit wrote:
>>
>>>On October 12, 2005 at 12:25:41, Salvo Spitaleri wrote:
>>>
>>>>Hello friends,
>>>>
>>>>How to split huge pgn files, in smaller files by name of the players?
>>>>
>>>>Can PGN-extract do it?
>>>
>>>Yes.
>>>
>>>So can SCID and many others too, I imagine.
>>
>>Hi Dann,
>>
>>I mean in a automatic way for all the players in the file.
>>SCID can do it only for one player to the time!
>
>Probably PGN Extract is better.
>
>1. Grep for player names with the "[White " and "[Black " tags
>2. Create a sorted unique list of players from the tags
>3. Use PGN-Extract to filter into groups from the players list.
>
>It goes without saying that the games will have lots of duplicates when filtered
>in this way (e.g. Fischer verses Karpov will show up in the Fischer file and in
>the Karpov file).

Dann, this certainly looks like excellent advice. But it seems to need quite
some work between the steps, that almost needs a programmer. I did not try, but
I guess the grep needs some escape for the "[". The result of the grep will need
some trimming (getting rid of quotation, [], White, Black). This seems the most
difficult part to me.

I don't know PGN-Extract well enough, to judge how well it would work here. I'd
fear, it would take many passes over the original PGN, that it could be just too
slow on a large PGN (say your "junkbase" with something around 3 GB).

It would produce probably really many files, not all file systems would be able
to handle this. Another complication might be "special" (non English) letters
inside the names (think of / or \).

No doubt, all these problems will be solvable. Perhaps PGN-extract already
handles the later mentioned issues well. However, it looks to me, as if it were
no easy "end user task".

I would probably try to use my PGN-parser and write it in C. I'd try to extract
the two names of each game, open in append mode two files with trimmed names in
append mode and write the game there. Then close the files again. When there is
no file number limitation, this might work in an hour or so. No idea, if the
large number of open/close would make the thing too slow for beeing useful for
really large PGNs.

Regards,
Dieter




This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.