RSS

Maximizing Query Performance for 100+ Billion Row Data Sets

Learn how Hydrolix delivers sub-second query latency across massive data sets with micro-indexing, predicate pushdown, and parallelism.

Franz Knupfer

Published:

Feb 13, 2024

8 minute read
,

Finding a very specific query match in a trillion-row database is a lot like looking for a needle in a haystack. Hydrolix consistently delivers sub-second latency for ad hoc queries across very large (100+ billion row) data sets while maintaining the lowest unit costs in the industry. So how do we achieve SSD-like performance with cost-effective object storage? This post will attempt to answer that question, with the caveat that some of the details are part of our secret sauce. You’ll learn the basics of how Hydrolix makes needle-in-a-haystack queries highly performant by utilizing techniques like:

  • Optimizing data for query performance at ingest time
  • Background compaction
  • Partition pruning 
  • Column pruning
  • Predicate pushdown with micro-indexing
  • Stateless computing, dynamic partitioning, massive parallelism, and on-demand scalability

Optimizing Data for Query Performance At Ingest Time

While this post will mostly focus on what happens at query time, achieving superior performance is only possible because of how Hydrolix automatically ingests, transforms, indexes, compresses, summarizes, and compacts data before (and after) writing it to commodity object storage. These optimizations include sparse indexes on all dimensions, micro-indexing data within each partition, and high-density compression.

Micro-indexing, a process unique to Hydrolix, involves using sparse indexing at the partition level to index column segments. With micro-indexing, a query can use predicate pushdown to retrieve individual column segments, making it much more efficient than retrieving an entire column.

Here’s how these optimizations improve query performance:

  • Partitioning based on the timing of events allows for partition pruning, automatic support for out-of-order and late-arriving data, and flexible, massively-parallel, stateless computing.
  • Micro-indexing avoids full-column scans, enables advanced predicate pushdown, and removes any dependency on caching for performance.
  • Hydrolix’s industry-leading compression maximizes I/O throughput when pulling data segments from object storage and also minimizes the size of indexes, making them cheap to create.

A Hydrolix partition consists of three files: manifest.hdx, index.hdx, and data.hdx. Hydrolix is optimized for time series data, so partitions are organized by event time windows ranging from a few seconds for new partitions to an hour for older partitions.

Background Compaction

Automated and intelligent background compaction balances the need to make incoming data immediately available with the benefits of larger, more efficient partitions.

Hydrolix is optimized for time series data and each partition has a maximum aggregated time window. Partitions are smallest at ingest time, with a default limit of five seconds or one million rows( whichever is reached first).

Over time, Hydrolix’s background merge service intelligently harvests these micro-partitions to construct larger, more efficient partitions which concatenate the data from smaller partitions. The merge service automatically optimizes late-arriving data and improves compression and performance. Older data is stored in large partitions with a max size of four GB or one hour.

Many data platforms have challenges with late-arriving and out-of-order data. This is not the case with Hydrolix, which orders late-arriving data by event time (not ingest time) during the partition merge process.

Partition Pruning

Hydrolix caches a small amount of information about each partition in a transactional catalog. This includes the storage location of the partition files (which can be spread across multiple cloud object stores), the min and max primary timestamp of each partition, and any secondary shard keys. Primary timestamps are user-configurable, and generally represent the event time of the record, rather than the time the event was received and indexed by Hydrolix. This allows more flexibility when handling out-of-order and late-arriving data.

When a user makes a time-filtered query in Hydrolix, the query head checks the catalog to determine which partitions contain the time range of the request. All other partitions are removed from consideration, a process known as partition pruning. The next image shows a simplified illustration of the process.

Image shows alert icon on right with a timestamp of 9:07AM to 9:09 AM. To the right of that, there are three partitions, each with different timestamp ranges. The partition in the middle has a timestamp range that matches the alert range. That partition is highlighted on the right as the only partition that needs to be queried.

The image shows an example where an incident occurred between 9:07 AM and 9:09 AM. In this example, two partitions can be pruned because they don’t include the time range of the query. For a large dataset in Hydrolix, partition pruning can eliminate thousands of partitions from consideration.

For the majority of queries, partition pruning significantly reduces the amount of data that needs to be retrieved. This all happens before any requests are made to object storage.

Columnar Databases and Column Pruning

A Hydrolix database is column-oriented, not row-oriented. Columnar databases have several major benefits over row-based databases for analytical queries of large data sets. First, columnar databases typically have much greater compression than row-based databases. In addition to a reduced storage footprint, columnar databases are also beneficial for query performance. Columnar data takes less physical space on a hard drive and compression reduces the amount of data you need to transfer from cloud storage, further improving efficiency.

With a columnar database, you can use predicate pushdown to prune columns and return only the columns you need. For example, if you want to check the status codes of each of your services (either for a dashboard or a quick ad hoc query), you might only need two columns: a service identifier such as service_name as well as a column (such as http_response_code) that contains the response code. All other columns will be pruned, reducing the amount of data a request returns.

Predicate Pushdown

An ad hoc query using many query peers in parallel can trigger hundreds or even thousands of concurrent HTTP requests. To minimize latency, it’s essential to transfer as little data as possible via those requests. To do so, Hydrolix and other data platforms use predicate pushdown, a technique that filters data close to the data source instead of first transferring data and then filtering it.

As an analogy, imagine the process of making a sandwich. Without predicate pushdown, you would take everything out of the refrigerator before gathering the ingredients you need and assembling the sandwich—a process that would be both terribly messy and inefficient. Instead, you should just take the ingredients you need while you’re looking in the fridge.

Some data solutions apply predicate pushdown at the column or row level. In addition to applying predicate pushdown at the column level, Hydrolix takes predicate pushdown a step further, applying it to individual column segments.

Maximizing Predicate Pushdown with Micro-Indexing

Most columnar databases index columns, but Hydrolix also sparsely indexes column segments within each partition. By doing so, Hydrolix queries can avoid full column scans and download as little data as possible for in-memory processing. The next image depicts how a partial column scan retrieves less data than either a full table scan or full column scans.

Illustration shows comparison of row-based db, columnar db, and Hydrolix. With row-based db, the entire table must be scanned. With columnar db, individual columns can be scanned. Meanwhile, Hydrolix can scan individual column segments.

With a row-based database, you typically need to do a full table scan even if you only need data from a few columns. Meanwhile, a typical columnar database narrows results to just the columns you need. Hydrolix narrows results even further with partial column scans, making queries more efficient.

To understand how partial column scans work, you first need to know about the structure of a Hydrolix partition. Each partition contains three files:

  • manifest.hdx: This is by far the smallest file, and it contains information about the location of every column index in the index.hdx file as well as segments of data in data.hdx.
  • index.hdx: Contains sparse indexes for every dimensional column to enable partial columns scans.
  • data.hdx: Compressed columnar data is stored in data.hdx. Not surprisingly, it’s the largest file. As an example, it might store five minutes worth of log data from a service and each row could have high dimensionality (for example, hundreds of columns).

The next animation shows how a query peer (a Kubernetes process running in parallel with other peers) interacts with each of the files in a partition.

As the animation shows, manifest.hdx and index.hdx are the smallest files while data.hdx is the largest. Meanwhile, the dark blue blocks represent the data that a query peer will retrieve from the partition during query time. In the case of ad hoc “needle in a haystack” queries, the peer needs only a small fraction of a partition’s data, making the process extremely efficient.

Let’s take a closer look at how the process works.

As the animation shows, a peer first retrieves manifest.hdx from the partition. Note that the file size is very small so it adds minimal overhead to a query. Manifest files are also cached on query peers. This can reduce the number of HTTP requests even further, though Hydrolix does not rely on cache hits for performance.

The manifest.hdx file is essentially a set of directions to the contents of its partition so the peer can use it to construct narrow byte range requests for portions of the index.hdx and data.hdx files. 

Next, the peer lazy loads data from index.hdx. Only narrow byte ranges are retrieved from the file, as represented by the dark blue blocks in the illustration of index.hdx. Because data from index.hdx lazy loads, the peer can start requesting data from data.hdx while it is still retrieving data from index.hdx.

Additional queries to index.hdx and data.hdx happen in parallel. Since columns in data.hdx are sparsely indexed, the peer can retrieve column segments and avoid full column scans. This predicate pushdown at the level of column segments is only possible because of micro-indexing.

Micro-indexing and maximizing the benefits of predicate pushdown are key reasons why Hydrolix achieves superior performance with needle-in-a-haystack queries.

Parallelism and Scalability

Hydrolix components are stateless and decoupled. You can scale up each part of the system for increased performance or down for decreased costs. The next image shows how you can scale peers for both ingest and query. Meanwhile, the Hydrolix database uses object storage that’s endlessly scalable and pay as you go.

Image shows how Hydrolix collects, retains and queries data by using parallelism. The stream head oversees a scalable number of stream peers while the query head oversees a scalable number of query peers. And data is compressed 20x-50x.

In the case of queries, query peers run in parallel to maximize performance, and scaling up the number of peers will improve performance further. Meanwhile, you might want to scale down either ingest (stream peers) or query (query peers) for cost savings. You can also scale the number of stream heads and query heads as well. Some teams may only need scaled-up query during incidents and can have query scaled down at other times. Off-hours such as weekends and evenings are also opportunities to save on compute costs.

You can use precision query scaling to separate query compute by teams and services. As an example, you might have a remediation team that needs the flexibility to scale up compute quickly during an incident. Workloads will not impact the performance of other workloads, preventing a problem commonly known as “noisy neighbors” where the compute needs of one team conflicts with the needs of another team, which can cause performance issues.

Next Steps

We can’t delve any deeper into how Hydrolix queries work because it’s all part of the secret sauce. While we can’t share the recipe, we invite you to try it out. Learn more about Hydrolix and contact us for a POC.

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

Finding a very specific query match in a trillion-row database is a lot like looking for a needle in a haystack. Hydrolix consistently delivers sub-second latency for ad hoc queries across very large (100+ billion row) data sets while maintaining the lowest unit costs in the industry. So how do we achieve SSD-like performance with cost-effective object storage? This post will attempt to answer that question, with the caveat that some of the details are part of our secret sauce. You’ll learn the basics of how Hydrolix makes needle-in-a-haystack queries highly performant by utilizing techniques like:

  • Optimizing data for query performance at ingest time
  • Background compaction
  • Partition pruning 
  • Column pruning
  • Predicate pushdown with micro-indexing
  • Stateless computing, dynamic partitioning, massive parallelism, and on-demand scalability

Optimizing Data for Query Performance At Ingest Time

While this post will mostly focus on what happens at query time, achieving superior performance is only possible because of how Hydrolix automatically ingests, transforms, indexes, compresses, summarizes, and compacts data before (and after) writing it to commodity object storage. These optimizations include sparse indexes on all dimensions, micro-indexing data within each partition, and high-density compression.

Micro-indexing, a process unique to Hydrolix, involves using sparse indexing at the partition level to index column segments. With micro-indexing, a query can use predicate pushdown to retrieve individual column segments, making it much more efficient than retrieving an entire column.

Here’s how these optimizations improve query performance:

  • Partitioning based on the timing of events allows for partition pruning, automatic support for out-of-order and late-arriving data, and flexible, massively-parallel, stateless computing.
  • Micro-indexing avoids full-column scans, enables advanced predicate pushdown, and removes any dependency on caching for performance.
  • Hydrolix’s industry-leading compression maximizes I/O throughput when pulling data segments from object storage and also minimizes the size of indexes, making them cheap to create.

A Hydrolix partition consists of three files: manifest.hdx, index.hdx, and data.hdx. Hydrolix is optimized for time series data, so partitions are organized by event time windows ranging from a few seconds for new partitions to an hour for older partitions.

Background Compaction

Automated and intelligent background compaction balances the need to make incoming data immediately available with the benefits of larger, more efficient partitions.

Hydrolix is optimized for time series data and each partition has a maximum aggregated time window. Partitions are smallest at ingest time, with a default limit of five seconds or one million rows( whichever is reached first).

Over time, Hydrolix’s background merge service intelligently harvests these micro-partitions to construct larger, more efficient partitions which concatenate the data from smaller partitions. The merge service automatically optimizes late-arriving data and improves compression and performance. Older data is stored in large partitions with a max size of four GB or one hour.

Many data platforms have challenges with late-arriving and out-of-order data. This is not the case with Hydrolix, which orders late-arriving data by event time (not ingest time) during the partition merge process.

Partition Pruning

Hydrolix caches a small amount of information about each partition in a transactional catalog. This includes the storage location of the partition files (which can be spread across multiple cloud object stores), the min and max primary timestamp of each partition, and any secondary shard keys. Primary timestamps are user-configurable, and generally represent the event time of the record, rather than the time the event was received and indexed by Hydrolix. This allows more flexibility when handling out-of-order and late-arriving data.

When a user makes a time-filtered query in Hydrolix, the query head checks the catalog to determine which partitions contain the time range of the request. All other partitions are removed from consideration, a process known as partition pruning. The next image shows a simplified illustration of the process.

Image shows alert icon on right with a timestamp of 9:07AM to 9:09 AM. To the right of that, there are three partitions, each with different timestamp ranges. The partition in the middle has a timestamp range that matches the alert range. That partition is highlighted on the right as the only partition that needs to be queried.

The image shows an example where an incident occurred between 9:07 AM and 9:09 AM. In this example, two partitions can be pruned because they don’t include the time range of the query. For a large dataset in Hydrolix, partition pruning can eliminate thousands of partitions from consideration.

For the majority of queries, partition pruning significantly reduces the amount of data that needs to be retrieved. This all happens before any requests are made to object storage.

Columnar Databases and Column Pruning

A Hydrolix database is column-oriented, not row-oriented. Columnar databases have several major benefits over row-based databases for analytical queries of large data sets. First, columnar databases typically have much greater compression than row-based databases. In addition to a reduced storage footprint, columnar databases are also beneficial for query performance. Columnar data takes less physical space on a hard drive and compression reduces the amount of data you need to transfer from cloud storage, further improving efficiency.

With a columnar database, you can use predicate pushdown to prune columns and return only the columns you need. For example, if you want to check the status codes of each of your services (either for a dashboard or a quick ad hoc query), you might only need two columns: a service identifier such as service_name as well as a column (such as http_response_code) that contains the response code. All other columns will be pruned, reducing the amount of data a request returns.

Predicate Pushdown

An ad hoc query using many query peers in parallel can trigger hundreds or even thousands of concurrent HTTP requests. To minimize latency, it’s essential to transfer as little data as possible via those requests. To do so, Hydrolix and other data platforms use predicate pushdown, a technique that filters data close to the data source instead of first transferring data and then filtering it.

As an analogy, imagine the process of making a sandwich. Without predicate pushdown, you would take everything out of the refrigerator before gathering the ingredients you need and assembling the sandwich—a process that would be both terribly messy and inefficient. Instead, you should just take the ingredients you need while you’re looking in the fridge.

Some data solutions apply predicate pushdown at the column or row level. In addition to applying predicate pushdown at the column level, Hydrolix takes predicate pushdown a step further, applying it to individual column segments.

Maximizing Predicate Pushdown with Micro-Indexing

Most columnar databases index columns, but Hydrolix also sparsely indexes column segments within each partition. By doing so, Hydrolix queries can avoid full column scans and download as little data as possible for in-memory processing. The next image depicts how a partial column scan retrieves less data than either a full table scan or full column scans.

Illustration shows comparison of row-based db, columnar db, and Hydrolix. With row-based db, the entire table must be scanned. With columnar db, individual columns can be scanned. Meanwhile, Hydrolix can scan individual column segments.

With a row-based database, you typically need to do a full table scan even if you only need data from a few columns. Meanwhile, a typical columnar database narrows results to just the columns you need. Hydrolix narrows results even further with partial column scans, making queries more efficient.

To understand how partial column scans work, you first need to know about the structure of a Hydrolix partition. Each partition contains three files:

  • manifest.hdx: This is by far the smallest file, and it contains information about the location of every column index in the index.hdx file as well as segments of data in data.hdx.
  • index.hdx: Contains sparse indexes for every dimensional column to enable partial columns scans.
  • data.hdx: Compressed columnar data is stored in data.hdx. Not surprisingly, it’s the largest file. As an example, it might store five minutes worth of log data from a service and each row could have high dimensionality (for example, hundreds of columns).

The next animation shows how a query peer (a Kubernetes process running in parallel with other peers) interacts with each of the files in a partition.

As the animation shows, manifest.hdx and index.hdx are the smallest files while data.hdx is the largest. Meanwhile, the dark blue blocks represent the data that a query peer will retrieve from the partition during query time. In the case of ad hoc “needle in a haystack” queries, the peer needs only a small fraction of a partition’s data, making the process extremely efficient.

Let’s take a closer look at how the process works.

As the animation shows, a peer first retrieves manifest.hdx from the partition. Note that the file size is very small so it adds minimal overhead to a query. Manifest files are also cached on query peers. This can reduce the number of HTTP requests even further, though Hydrolix does not rely on cache hits for performance.

The manifest.hdx file is essentially a set of directions to the contents of its partition so the peer can use it to construct narrow byte range requests for portions of the index.hdx and data.hdx files. 

Next, the peer lazy loads data from index.hdx. Only narrow byte ranges are retrieved from the file, as represented by the dark blue blocks in the illustration of index.hdx. Because data from index.hdx lazy loads, the peer can start requesting data from data.hdx while it is still retrieving data from index.hdx.

Additional queries to index.hdx and data.hdx happen in parallel. Since columns in data.hdx are sparsely indexed, the peer can retrieve column segments and avoid full column scans. This predicate pushdown at the level of column segments is only possible because of micro-indexing.

Micro-indexing and maximizing the benefits of predicate pushdown are key reasons why Hydrolix achieves superior performance with needle-in-a-haystack queries.

Parallelism and Scalability

Hydrolix components are stateless and decoupled. You can scale up each part of the system for increased performance or down for decreased costs. The next image shows how you can scale peers for both ingest and query. Meanwhile, the Hydrolix database uses object storage that’s endlessly scalable and pay as you go.

Image shows how Hydrolix collects, retains and queries data by using parallelism. The stream head oversees a scalable number of stream peers while the query head oversees a scalable number of query peers. And data is compressed 20x-50x.

In the case of queries, query peers run in parallel to maximize performance, and scaling up the number of peers will improve performance further. Meanwhile, you might want to scale down either ingest (stream peers) or query (query peers) for cost savings. You can also scale the number of stream heads and query heads as well. Some teams may only need scaled-up query during incidents and can have query scaled down at other times. Off-hours such as weekends and evenings are also opportunities to save on compute costs.

You can use precision query scaling to separate query compute by teams and services. As an example, you might have a remediation team that needs the flexibility to scale up compute quickly during an incident. Workloads will not impact the performance of other workloads, preventing a problem commonly known as “noisy neighbors” where the compute needs of one team conflicts with the needs of another team, which can cause performance issues.

Next Steps

We can’t delve any deeper into how Hydrolix queries work because it’s all part of the secret sauce. While we can’t share the recipe, we invite you to try it out. Learn more about Hydrolix and contact us for a POC.