Computer Chess Club Archives


Search

Terms

Messages

Subject: Not quite so off topic (compressing table bases)

Author: Ratko V Tomic

Date: 07:41:53 01/03/06

Go up one level in this thread


It may appear as an off topic post, but folks here encoding tablebases may wish
to check it out (see the source code on the QI web page). For that type of data
the QI is much better than Huffman or Arithmetic coding. On very sparse data
(very skewed symbol distribution) QI codes at the speed of run-length coder
(which is couple orders of magnitude faster than Arithmetic, even than Huffman
coder), yet more optimally and without fragility in case of denser clusters of
data. See for example row Vary in the preprint (page 10) where sudden changes in
symbol frequencies don't affect QI almost at all, while throwing off adaptive
Arithmetic coder completely off track, so it was outputing over twice the size
of QI's.

Additional benefit for table bases is that, unlike Huffman or Arithmetic coders,
for fixed input entropy the QI output size is strictly fixed and obtanable
without having to expand the compressed data. Arihtmetic & Huffman will produce
variation in the output size (Huffman more so) and both need to decode the data
to find out how big the compressed size is. The QI source code (on the web page
given earlier) has a fully functional optimal and very fast permutation coder
which encodes permutations (with n into millions). While it is easy to code
permutation via Huffman code, the compressed data is few percent larger and it
varies in size by 10-15 percent. The QI output is always exactly of the size
log(n!) bits, to within 10^-7 bits and one can know the exact number of
compressed bits from number n alone (while Huffman output size will vary from
permutation to permutation, and you can't know what it is unless you decode the
output). This property of QI makes it very suitable for encoding data which
needs to be accessed from its compressed form, skipped over from one to another
compressed chunk of data.

Another advantage of QI is the ability to code even small chunks of input
optimally i.e. its O(1) and log(n) redundancy terms are smaller than those of
Arithmetic coder (which is not suitable for inputs of few tens or even hundreds
of bits), as you can see from the test results table on page 10 in the preprint,
showing the relative compression gain of QI vs AC increasing with smaller input
sizes.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.