Under Submission

Generalized Elias-Fano

The first compressed index to offer simultaneous state-of-the-art compression, constant-time random access, and high throughput for arbitrary integer sequences.

Why GEF?

The Limitation

The classic Elias-Fano encoding is the gold standard for compression, but it is strictly limited to sorted (monotonic) sequences.

When it comes to non-sorted data, there are many solutions, but all of them fail to simultaneously combine high compression ratios, high-throughput, and fast random access support.

The GEF Solution

GEF bridges this gap by exploiting runs and gaps in the high-order bits of integers.

It unlocks compression performance typical of general purpose compressors (e.g., Brotli and Zstd) while retaining the ability to access any element in constant time.

For some real-world datasets (e.g., Time Series), GEF even approaches the theoretical lower bound.

Choose Your Variant

GEF offers four specialized templates tailored to different data distributions. Use this guide to select the right one.

RLE-GEF

Fastest Access

Optimized for sequences with "runs" of values sharing high-order bits.

BEST FOR:

Speed-critical applications & clustered data.

U-GEF

Mostly Sorted

Tailored for sequences with mostly positive small gaps and occasionally large gaps.

BEST FOR:

Graph adjacency lists, timestamps & sorted runs.

B-GEF

Bidirectional

Tailored for sequences with mostly small gaps (regardless of sign) and occasionally large gaps.

BEST FOR:

Sensors, fluctuating logs, volatile time series with spikes.

STAR

B*-GEF

Max Compression

Most compact for small gaps. Second fastest after RLE-GEF. Offers space guarantees & theoretical optimality (quasi-succinctness).

BEST FOR:

Time series with small gaps where spikes are unusual.

On the Pareto Frontier

GEF consistently sits on the Pareto frontier of compression ratio vs. access speed. It matches general-purpose compressors (Xz, Brotli) in space while being orders of magnitude faster.

  • Up to 500x faster compression

    than general-purpose compressors (e.g. Brotli, Xz) and SOTA learned compressors (e.g. NeaTS).

  • Up to 42.83% more compact

    among compressors with high throughput and random access.

HIGH PERFORMANCE
  • Up to 5 GB/s compression throughput (multi-threaded)
  • > 1 GB/s decompression speed
  • O(1) Constant-time random access

Throughput vs Ratio
Pareto Frontier

Detailed comparisons against Brotli, Xz, Zstd, NeaTS, ALP, LeCo, and Falcon available in the benchmarks repository.

Simple C++ Interface

Header-only library. easy to integrate via CMake.

main.cpp
#include <gef/gef.hpp>
#include <vector>

int main() {
    // 1. Define an arbitrary (unsorted) sequence
    std::vector<uint32_t> sequence = {1, 3, 7, 11, 15, 23, 31, 20, 10, 5};
    
    // 2. Create index (B-GEF handles non-monotone automatically)
    gef::B_GEF<uint32_t> index(sequence);
    
    // 3. Constant-time random access
    uint32_t val = index[3];  // Returns 11
    
    // 4. Efficient Range Decompression
    std::vector<uint32_t> buffer(5);
    index.get_elements(2, 5, buffer); // Decodes 7, 11, 15, 23, 31
    
    return 0;
}