Author: Steven Edwards
Date: 06:16:29 04/08/05
Go up one level in this thread
On April 08, 2005 at 00:23:15, Tor Alexander Lattimore wrote: >First, is it alright to use enormous.pgn for a book? Sure. But I would be careful with the MinPlayCount parameter as there are a lot of duplications and a lot of questionable games. >Secondly, i've been trying >to parse it recently and my program seems to be doing fine until about 300,000 >games where it just returns EOF. I've tried opening and reading from other large >files and get the same problem. Initially I tried using C++'s <iostream> >library, but when that didn't work I tried standard C fopen() and fgetc() with >no more success. The file is 900 MB, so shouldn't be a problem where windows >does strange stuff with 2GB or > files. When I first tried parsing a copy of enormous.pgn a year or so ago, I encountered a number of difficulties. Some of them I remember were: 1. There were more than a few out of range [0x0a,0x0d,0x20..0x7e] characters in the data, and not all of these are inside character literals. Some appear between games. 2. There were some PGN tags I had never heard of, and so I had to adjust by parser/compiler to handle these. 3. Be careful in that there might be some tag names like "EventDate" that could trigger a false "Event" match. 4> I seem to recall that some of the recursive annotative variations were bogus (i.e., syntactically incorrect). After cleaning up and de-duping my copy of enormous.pgn, I have the file e.pgn with 1400895 games: [cynthia:~/Arena/Symbolic/PGN] sje% wc e.pgn 22724727 187736744 802942905 e.pgn I could upload this to an ftp site if one is available.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.