The first compressed index to offer simultaneous state-of-the-art compression, constant-time random access, and high throughput for arbitrary integer sequences.
The classic Elias-Fano encoding is the gold standard for compression, but it is strictly limited to sorted (monotonic) sequences.
When it comes to non-sorted data, there are many solutions, but all of them fail to simultaneously combine high compression ratios, high-throughput, and fast random access support.
GEF bridges this gap by exploiting runs and gaps in the high-order bits of integers.
It unlocks compression performance typical of general purpose compressors (e.g., Brotli and Zstd) while retaining the ability to access any element in constant time.
For some real-world datasets (e.g., Time Series), GEF even approaches the theoretical lower bound.
GEF offers four specialized templates tailored to different data distributions. Use this guide to select the right one.
Fastest Access
Optimized for sequences with "runs" of values sharing high-order bits.
Speed-critical applications & clustered data.
Mostly Sorted
Tailored for sequences with mostly positive small gaps and occasionally large gaps.
Graph adjacency lists, timestamps & sorted runs.
Bidirectional
Tailored for sequences with mostly small gaps (regardless of sign) and occasionally large gaps.
Sensors, fluctuating logs, volatile time series with spikes.
Max Compression
Most compact for small gaps. Second fastest after RLE-GEF. Offers space guarantees & theoretical optimality (quasi-succinctness).
Time series with small gaps where spikes are unusual.
GEF consistently sits on the Pareto frontier of compression ratio vs. access speed. It matches general-purpose compressors (Xz, Brotli) in space while being orders of magnitude faster.
than general-purpose compressors (e.g. Brotli, Xz) and SOTA learned compressors (e.g. NeaTS).
among compressors with high throughput and random access.
Throughput vs Ratio
Pareto Frontier
Detailed comparisons against Brotli, Xz, Zstd, NeaTS, ALP, LeCo, and Falcon available in the benchmarks repository.
Header-only library. easy to integrate via CMake.
#include <gef/gef.hpp>
#include <vector>
int main() {
// 1. Define an arbitrary (unsorted) sequence
std::vector<uint32_t> sequence = {1, 3, 7, 11, 15, 23, 31, 20, 10, 5};
// 2. Create index (B-GEF handles non-monotone automatically)
gef::B_GEF<uint32_t> index(sequence);
// 3. Constant-time random access
uint32_t val = index[3]; // Returns 11
// 4. Efficient Range Decompression
std::vector<uint32_t> buffer(5);
index.get_elements(2, 5, buffer); // Decodes 7, 11, 15, 23, 31
return 0;
}