Computer Chess Club Archives


Search

Terms

Messages

Subject: O(n) for radix/bucket/counting sort is something of a misnomer

Author: Dann Corbit

Date: 10:45:33 05/11/01

Go up one level in this thread


On May 11, 2001 at 03:33:07, David Rasmussen wrote:
[snip]
>>So all of them are sorted in O(n) then!
>>
>
>I know. Radix Sort is the same.

O(n) is not generally used to describe bucket sorting, radix sorting and the
like anymore.  That's because keysize is so important.

Generally, it is O(n*log(keysize/keybucketsize))

Consider a character SSN (555-55-5555). We can shrink out the dashes, because
they have no value in the search.  That gives us 9 buckets.

LSD radix sort will do 1 counting pass and one distribution pass for each digit
(in this case, we can use a character).  That means for 9 buckets, we have 18
passes of the data.

In order for (for instance) relaxed-heapsort to do worse {it's O(n*log(n))} n
would have to be 2^18=262144 elements, [roughly speaking, of course] for break
even.  If you have a long key (say 30 characters) now it takes 2^60th elements
for LSD radix sort to be faster.

Of course, you can do some important optimizations.  You can (for instance)
count all the buckets for the entire key in one pass.  Still, with long keys,
that won't help much as far as causing those sorts to beat an ordinary,
efficient O(n*log(n)) sort.

O(n) is really masking the true complexity of that kind of sorting.  There's a
discussing of this stuff in chapter 13 of the book "C Unleashed"; but the guy
who wrote that chapter is something of a whinging twit, so I don't know if you
could trust it.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.