Probabilistic data structure that estimates the number of unique elements in a set.
hll
extension already enabled.
Let’s create a sample analytics system that efficiently tracks unique user interactions.
We’ll use a fact table to store raw events and leverage HLL (HyperLogLog) for efficient approximate distinct
counting in our aggregated statistics. This pattern is common in analytics systems where you need to track unique users across different time periods while maintaining reasonable storage and query performance.
events
Tablehll
extension provides several hashing functions for different data types:
hll_hash_integer(value)
- for integer valueshll_hash_text(value)
- for text valueshll_hash_bytea(value)
- for binary datahll_hash_any(value)
- for other data typeshll_union_agg
function:
hll
extension in PostgreSQL is an efficient solution for large-scale distinct counting, offering fast and memory-efficient approximations. It is particularly useful for analytics and tracking unique values over time.
For more details, refer to the hll
GitHub repository.