Sux is an umbrella nickname for the results of my fiddling with the implementation of basic succinct data strucures.
The resulting code is rather sparse. The main highlights are:
- a novel, broadword-based implementation of rank/select queries for up to 264 bits that is highly competitive with known 32-bit implementations on 64-bit architectures (additional space required is 25% for ranking and 12.5%-37.5% for selection);
- several Java structures using the Elias–Fano representation of monotone sequences for storing pointers, variable-length bit arrays, etc.
- Java code implementing minimal perfect hashing using around 2.68 bits per element (also using some broadword ideas);
- a few Java implementations of monotone minimal perfect hashing.
C++ sucks. No, that's not the reason. The problem is that if you want to compare experimentally your rank/select implementations you need a language that puts you in control—every single computational aspect, including cache misses, perturbs the result.
The C++ code in Sux just uses C++ for namespace handling. No object-oriented or otherwise bizarre feature of the language is used.
Writing in Java code that (essentially) has to roll bits over and over may seem a Bad Thing™. However, one should take into consideration the following points:
- Improvements in JVMs makes low-level code written in Java faster and faster; often, the performance penalty w.r.t. an equivalent C/C++ application is relatively small.
- Succinct techniques can be mixed in several different ways, and an object-oriented language makes it very easy to play with different implementations of the same interface.
- An object-oriented language make it possible to write interfaces such as BitVector, which provide a wide range of accesses and methods that can be made efficient in suitable classes (e.g., LongArrayBitVector).
- When you're convinced your data structure works, you can easily rewrite it in C/C++.