Hamburg (Germany), the 29th April 1997. Written by Nils Pipenbrinck aka Submissive/Cubic & $eeN

This is a short tutorial on Radix-sort. If you already know what radix-sort is, how it works and you don't use quicksort anymore you can skip this tutorial. The tutorial is intended for programmers who have to write a fast sorting algorithm for small and medium sized lists. So if you want to sort a database you should search the web for another tutorial. If you however write a 3d-engine or such stuff you should read on because radix-sort is very good for this purpose.

How do you sort a list if numbers? Basically you have an algorithm which compares numbers and tries to get the list more and more sorted. The Radix-sort is a little bit different. It doesn't compare anything, it sorts with several passes the numbers. Lets first start with a list of bytes. A very simple radix-sort for bytes would work like this:

- source
- List of bytes
- source_n
- number of bytes to sort
- dest[256]
- 256 lists of bytes. each list should have enough space to hold source_n elements.

a pseudo-code would sort the list this way:

for i=0 to source_n do dest[source[i]].append(source[i])

this means you append all items with value #1 to list 1, all items with value #2 to list 2 and so on. As you can see this algorithm runs very fast (it only copies the bytes in a very tight loop). The backdraw is, that you need a lot of memory to sort the list because you don't know how many items of each type you have.

Do you really need to know how many items of each type you have? No, you can count them and optimize your memory-requirements. We'll need another loop to count the distribution of the source-values:

int distribution[256] // fill the list with zeros. for i=0 to 255 do distribution[i] = 0; // build a distribution history: for i=0 to source_n do distribution[source[i]] = distribution[source[i]] + 1; // Now we build a index-list for each possible element: int index[256]; index[0] = 0; for i=1 to 255 do index[i] = index[i-1] + distribution[i-1];

adapting the old and simple radix-sort to take use of the new information will result in this code:

dest: array of bytes with space for source_n bytes. for i = 0 to source_n do dest[index[source[i]]] = source[i]; index[source[i]] = index[source[i]] + 1;

After this dest is the sorted source-array. As you can see we only need the same amount of memory as the sourcelist. This is quite a saving, and if your lists don't contain millions of values we can live with the memory-requirement.

Who want's to sort bytes? You? I norally sort ascii-texts, floats or longints. Because I only want to explain how the extenstion works I'll go on with longints. If you want increase the size of the elements to sort you can do this in several ways. Either you increase the size of index and distribution or you sort in several passes. Increasing the size of the index only works for short values (sorting smallints would require at least 65536*2 bytes for index and distribution list which finally needs 256 kb of memory.. I think this is to much (only think about how long it takes to clear the memory and build the index-list.. In this time quicksort may have sorted your list already)). Now we need a clever way to sort more bits with the same memory-amount. We can do this with several passes. Let's do it in decimal.

unsorted list: 523 153 088 554 235 sorting for Radix 0 (least significant digit) 523 153 554 235 088 ^ sorting for Radix 1 (2nd. significant digit) 523 235 153 554 088 ^ sorting for Radix 2 (most. significant digit) 088 153 235 523 554 ^

As you can se we preserve the old sorting order and advance by one digit until we've sorted all digits. This illustration did this in decimal. The native digit-format for computers are bits, bytes, words and dwords.

I know.. real programmers code in assembler. (Hey! I thought you want to understand radix-sort, not assembly). The program compiles well under watcom c++. However, it should at least work on all 32-bit c compilers and on some 16 bit compilers.

#include<iostream>#include<cstdlib>#include<cstring>#include<cassert>#ifdef_WINDOWStypedefunsigned long uint32_t;typedefunsigned char uint8_t;#else#include<stdint.h>#endifstaticvoidradix(int byte,constunsigned N,constuint32_t *source, uint32_t *dest) { unsigned count[256]; unsigned index[256];memset(count, 0,sizeof(count));for(unsigned i=0; i<N; ++i) count[((source[i])>>(byte*8))&0xff]++; index[0]=0;for(unsigned i=1; i<256; ++i) index[i] = index[i-1] + count[i-1];for(unsigned i=0; i<N; ++i) dest[index[((source[i])>>(byte*8))&0xff]++] = source[i]; } voidradixsort(uint32_t *source,constunsigned N) { uint32_t *temp =newuint32_t[N];radix(0, N, source, temp);radix(1, N, temp, source);radix(2, N, source, temp);radix(3, N, temp, source);delete[] temp; }staticvoidmake_random(uint32_t *data,constunsigned N) {for(unsigned i=0; i<N; ++i) data[i] =rand() | (rand()<<16); }staticvoidcheck_order(constuint32_t *data, unsigned N) {for(--N ; N > 0; --N, ++data)assert(data[0] <= data[1]); } intmain() { uint32_t data[100];make_random(data, 100);radixsort(data, 100);check_order(data, 100);return0; }

I received some reactions from folks, who read this tutorial. There are some interesting sidenotes and implementation details which might be interesting for you.

- Andre Reinald rewrote my example source and did some nice things to speed it up. All in all it's the same code, but some implementation details are more compiler friendly and lead to better code. Click here to download his code.
- Update 6.4.2001: Andre sent me a new version of this code where he cleaned up some things. As he said he removed a couple of shift operations which could make the compiled code a little bit faster. But it took 8 years until 2009 before Paul Harris tested the code and found that it sorted in reverse order. Click here for the new version.
- Update 2011-07-18: And another 2 years later Ryan Rohrer found out the the previous version still was not correct, because it did the radix sort in the wrong order. The above version is fixed now, but Ryan also wrote a more modern C++ version: radix_ar_2011.cpp.
- Update 2011-09-04: Tomy Novella pointed out an off-by-one error in the pseudo-code.

I hope you learned something. I have to thank Climax from Amable for makeing me interested in this algorithm. (well, some weeks before Assembly 1996 I still thought Quicksort would be the best algorithm to sort. Well, for ASCII-strings it still is, but for an engine... Hm.. Radix sort really rocks.) If you want to contact me you might write me a mail. Some of you might also notice, that this tutorial has been released at to the hornet archive in late 1996. However, I think this page is a better place for it.