Author: Tim Mirabile
Date: 22:32:46 08/09/99
Go up one level in this thread
Thanks for posting this here. I like the idea of saving disk space, but I would like to get some feedback from archive users before I jump into it. One thing I would consider is concatenating each day's worth of files into a single file before zipping them. I could automate this in the same perl script which creates the day's archive. Then the monthly archive would contain 31 files or less. The only problem with this is that people would never get a chance to download the messages as separate files. This may not be a concern for anyone, and if it is it should be possible to format the combined message file so that it could be easily split. On August 09, 1999 at 13:07:41, Ratko V Tomic wrote: >The monthly archive files are stored as zip files >containing hundreds of small txt files. This >fragmentation has a large negative effect on both >compression ratio (thus download time) and the disk >space used by the decompressed txt files. >Additionally searching for some text in the large >number of txt files is slower compared to the same >text in a single file. > >SUGGESTION: Merge all txt files from a single >monthly archive into a single (or a few) txt files >and ZIP the merged file. > >TEST: As an example of the savings, I took one >typicall monthly archive, M981130.ZIP which had >1147k, containing 1078 small text files. > > Archive M981130.ZIP: 1147k B1 > Decompressed TXT files: 2104k > Disk space used by TXT: 17248k A1 (waste due to disk granularity) > ----------------------------------- > Merged X.TXT file: 2104k > Disk used by X.TXT: 2112k A2 > Zipped X.TXT size: 569k B2 > ----------------------------------- > Ratio A1/A2: 8.2 > Ratio B1/B2: 2.0 > ----------------------------------- > >Therefore, the decompresed files would use >8.2 times less disk space than the current >fragmented scheme (and have much faster searches). > >The compressed files would save half the space for >the zip files and take half the time to download. > >In order to accomodate users who cannot read with >their editors files of one or more Mb, instead of >single monthly TXT file, they can be merged into >chunks of 2-300k per file, which would still >retain all the savings given in the example. > >The whole process of conversion can be automated, >the way depends on the server's operating system. >(If the site webmaster uses Windows/MS-DOS system >I could provide utilities for fast automatic >conversion.)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.