Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Announcement: Junkbase for SCID

Author: Dann Corbit

Date: 14:56:05 02/01/01

Go up one level in this thread


On February 01, 2001 at 14:36:59, Dann Corbit wrote:

>What is Junkbase for SCID?  It is a collection of a very large number of games
>gathered from public sources, cleaned, duplicate filtered, and put into SCID
>format.  The SCID database files are compressed with bzip so that they are very
>small compared even to a zip file.
>
>Here are the data files for over 2 million PGN games in SCID format:
>ftp://cap.connx.com/pub/Scid/junkbase.sg.bz
>ftp://cap.connx.com/pub/Scid/junkbase.si.bz
>ftp://cap.connx.com/pub/Scid/junkbase.sn.bz
>
>Here is bzip, in case you don't have that to decompress them:
>ftp://cap.connx.com/pub/Scid/bzip.exe
>
>Compressed, they are this size:
>01/30/2001  03:31p         131,824,621 junkbase.sg.bz
>01/30/2001  03:36p          50,660,956 junkbase.si.bz
>01/30/2001  03:31p           3,734,485 junkbase.sn.bz
>               3 File(s)    186,220,062 bytes
>
>Decompressed, they are this size:
>01/30/2001  03:31p         276,177,816 junkbase.sg
>01/30/2001  03:36p          92,161,199 junkbase.si
>01/30/2001  03:31p           6,821,325 junkbase.sn
>               3 File(s)    375,160,340 bytes
>
>The raw PGN file was this size:
>01/26/2001  08:47p       1,584,017,928 junkbase.pgn
>
>Now, before you run off and get them, some warnings:
>1.  You have to have SCID installed.  It's anything but trivial because it
>requires TCL/TK, which is a pain to install if you're not a techie.
>2.  These games are gathered from a bazillion public sources, and so the name
>fits.  Do not expect the sparkling quality of a commercial database
>3.  I don't support this stuff.  It's a free present for anyone who wants it but
>I will definitely ignore any emails you send asking questions about it.

Added a new subdirectory called junkbase that has the same thing compressed with
bzip2.exe (along with bzip2.exe).
It saves a measly meg, but (hey) a meg is a meg.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.