RSS

Deep Dive: Latest N Rows with Row-Level Partition Pruning

Learn how we optimized the Hydrolix query engine for latest N rows queries through concurrent processing and strategic data handling at scale.

Federico Rodriguez

Published:

May 01, 2024

6 minute read
, ,

TL;DR

Starting with Hydrolix 4.10, the following query types see notable performance boosts due to partition-aware optimizations:

For queries 1 and 2, high-level partition pruning is now automatically implemented at the catalog level, as discussed in Latest N Rows Optimized: Crafting Efficiency with the Hydrolix Catalog.

For query type 3, row-level partition pruning stops searching through partitions once sufficient rows match the filter criteria within query peers. This posed unique challenges in our massively parallel system, which are detailed in this post.

Imagine you’d like to find the 5 most recent logs containing the string “error” in a trillion row dataset. You don’t know if those 5 logs occur in the last second or the last hour (and you’d rather not guess) so you do not include a timestamp time range in your WHERE clause.

In a trillion row dataset, you could probably step away for a coffee before a query like this completed. With Hydrolix 4.10,  this query completes in under a second.

Before 4.10:

5 rows in set. Elapsed: 348.683 sec. Processed 1.32 trillion rows, 92.53 TB (3.78 billion rows/s., 268.33 GB/s.)

After 4.10:

5 rows in set. Elapsed: 0.850 sec. Processed 8.82 million rows, 634.57 MB (10.38 million rows/s., 746.44 MB/s.)

To understand how we got here, we must first answer the following question: 

How can we filter on a column and still achieve the partition skipping we get from catalog-driven optimization?

We tackled this filtering problem directly within the Hydrolix query engine.

Early Termination

In versions prior to 4.10, Hydrolix query peers were not “partition-aware” during the row filtering process. Instead, query peers simply pulled rows as quickly as they could. To optimize our query engine for filtered “latest N rows” queries in version 4.10, we implemented early termination on query peers, utilizing partition min and max timestamps to read only necessary partitions.

GIF shows two partitions that don't have overlapping timestamps. The query peers only need to process the most recent partition.

The previous diagram shows the perspective of a query peer checking partitions (denoted as blue cylinders) for relevant rows. It first looks at partition 1, finding several rows that match the query, and now is ready to determine whether it needs to read partition 2. The vertical axis correlates with the timestamp bounds of each partition while the horizontal axis shows the order that the query peer processes the partitions. 

Out of the 10 rows matching the filter, the earliest is 8:01 AM. Since we have more than 5 rows, and we know partition 2 contains data from 5 AM to 7:59 AM, the query peer should skip partition 2 because it cannot contain any rows that are relevant to the query.

By utilizing early termination, the query engine can potentially skip many partitions, leading to significantly faster query execution and reduced resource consumption. 

Excellent! We’re done! Or are we?

Handling Partition Overlap

In Hydrolix’s massively parallel environment, partitions are allowed to overlap, a design choice that offers flexibility and resilience in data management. However, this feature introduces an additional layer of complexity when optimizing queries. Consider the next diagram, which shows a scenario where time ranges overlap.

The GIF shows two partitions with overlapping timestamps. The query peers have found matching rows from the first partition that are more recent than the maximum timestamp of the second partition. However, the query peer doesn't have enough information to skip the second partition.

Here, the time ranges of both partitions intersect. Even though the query peer has already retrieved 5 rows from partition 1 before partition 2’s max timestamp of 10:50 AM and it’s clear to us based on the diagram that partition 2 can be skipped, the query peer can’t just rely on the minimum timestamp to skip partition 2. It needs additional information about the rows it has retrieved.

To solve this issue, we need a better algorithm than using the minimum of the matched rows. So how do we proceed?

Using Bounded Priority Queues

We solved this issue using a bounded priority queue, which has the following useful properties: it consistently maintains the largest N elements and efficiently returns the smallest of these elements in constant time. When a new element is introduced, it replaces the smallest in the queue if the new element is larger.

GIF of a bounded priority queue with 5 items. As higher max values are added, the lowest value is pushed out.

By using a bounded priority queue, the Hydrolix query engine can efficiently keep track of only the most recent matching rows. With this optimization, the query peer can compare the minimum timestamp in the priority queue with the maximum timestamp of a partition. If the partition’s max timestamp is less than this minimum, the partition can’t contain any rows that will be more recent than the rows the query peer has already found, which means it can be skipped.

Let’s take a look at the previous illustration of two overlapping partitions again. With a bounded priority queue, the query peer stores only the 5 most recent timestamps from between 10:55 AM and 10:59 AM. It can compare the minimum timestamp from the queue with the maximum timestamp of partition 2 to determine that it can skip partition 2.

The GIF shows two partitions with overlapping timestamps. The query peers have found matching rows from the first partition that are more recent than the maximum timestamp of the second partition. With a bounded priority queue, the second partition doesn't need to be processed.

Scenarios Where Partitions Can’t Be Skipped

Sometimes the query peer must read the full partition. Here are some of those scenarios:

Perfectly Overlapping Partitions

GIF shows example of four partitions that completely overlap. All of the partitions must be queried.

Since every partition spans from 8am to 11am, each partition might contain more recent rows than the previous. The query peer must process every partition to find the most recent 5 rows.

Max Timestamp Above Query Bounds

GIF shows example of four partially overlapping partitions. The max timestamp of each partition is above the query bounds, so all of the partitions needs to be processed.

Let’s take a look at an example of a query with a primary filter:

In the previous illustration, all four partitions have max timestamps that are greater than our query’s upper time bound. The query peer will never find rows matching the filter that are greater than the max timestamp of any of these partitions. All rows matching the filter must be between 8:45 and 8:10am, so all partitions must be read entirely.

Needle in a Haystack Query

It’s also possible that there are very few rows matching our filter. In this case we must continue searching until we’ve met the LIMIT—in this case, 5 rows meeting the query condition. For example, if the query peer has only found 4 rows that meet the condition, it must search all partitions until it’s found at least 5 before it can start comparing the min timestamp of the priority queue against the max timestamp of a partition.

GIF shows example of a needle in a haystack query. There are four overlapping partitions and each partition has very few rows that match the query (four total across the partitions), so the bounded priority queue is never filled.

Minimizing the Side Effects of Worst Case Scenarios

If we can’t optimize, we can still be efficient.

Typically, we experience significant performance gains by skipping a large number of partitions. However, sometimes optimization isn’t possible. In these situations, the goal is to ensure that optimization efforts don’t have side effects like decreased query performance.

Hydrolix works at a massively parallel scale, grabbing far more than one partition at a time. To compensate for potential issues, we have a novel approach to minimizing the adverse effects of locking. Every block that passes through our system is tagged with the current maximum timestamp to beat. Threads individually compete to find the next skippable partition without waiting for each other. The next diagram illustrates this process.

GIF shows multiple threads receiving blocks of information that are stamped with the maximum timestamp to beat. One of the threads finds a block that exceeds the maximum timestamp.

Conclusion: Optimizing with Precision and Minimalism

By leveraging bounded priority queues and concurrency-aware techniques, we  maximize performance through intelligent partition-skipping and minimize the side effects when direct optimization paths are obscured.

Ultimately, the Hydrolix query engine’s sophisticated design principles reflect an overarching theme: optimizing not just for the best-case scenarios but also ensuring graceful handling of the most demanding queries. This dual focus on performance and reliability ensures that Hydrolix remains at the cutting edge of data processing technology, providing rapid, reliable results across all scales of operation.

For Hydrolix users: To verify if your query utilized this optimization, you can check the limit_optimization section in the query log. If for any reason you’d like to disable this optimization, you can do so using the query setting hdx_query_optimize_order_by_primary = false.

Next Steps

Learn more about Hydrolix and contact us for a POC.

Interested in working at Hydrolix? See our available roles.

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

TL;DR

Starting with Hydrolix 4.10, the following query types see notable performance boosts due to partition-aware optimizations:

For queries 1 and 2, high-level partition pruning is now automatically implemented at the catalog level, as discussed in Latest N Rows Optimized: Crafting Efficiency with the Hydrolix Catalog.

For query type 3, row-level partition pruning stops searching through partitions once sufficient rows match the filter criteria within query peers. This posed unique challenges in our massively parallel system, which are detailed in this post.

Imagine you’d like to find the 5 most recent logs containing the string “error” in a trillion row dataset. You don’t know if those 5 logs occur in the last second or the last hour (and you’d rather not guess) so you do not include a timestamp time range in your WHERE clause.

In a trillion row dataset, you could probably step away for a coffee before a query like this completed. With Hydrolix 4.10,  this query completes in under a second.

Before 4.10:

5 rows in set. Elapsed: 348.683 sec. Processed 1.32 trillion rows, 92.53 TB (3.78 billion rows/s., 268.33 GB/s.)

After 4.10:

5 rows in set. Elapsed: 0.850 sec. Processed 8.82 million rows, 634.57 MB (10.38 million rows/s., 746.44 MB/s.)

To understand how we got here, we must first answer the following question: 

How can we filter on a column and still achieve the partition skipping we get from catalog-driven optimization?

We tackled this filtering problem directly within the Hydrolix query engine.

Early Termination

In versions prior to 4.10, Hydrolix query peers were not “partition-aware” during the row filtering process. Instead, query peers simply pulled rows as quickly as they could. To optimize our query engine for filtered “latest N rows” queries in version 4.10, we implemented early termination on query peers, utilizing partition min and max timestamps to read only necessary partitions.

GIF shows two partitions that don't have overlapping timestamps. The query peers only need to process the most recent partition.

The previous diagram shows the perspective of a query peer checking partitions (denoted as blue cylinders) for relevant rows. It first looks at partition 1, finding several rows that match the query, and now is ready to determine whether it needs to read partition 2. The vertical axis correlates with the timestamp bounds of each partition while the horizontal axis shows the order that the query peer processes the partitions. 

Out of the 10 rows matching the filter, the earliest is 8:01 AM. Since we have more than 5 rows, and we know partition 2 contains data from 5 AM to 7:59 AM, the query peer should skip partition 2 because it cannot contain any rows that are relevant to the query.

By utilizing early termination, the query engine can potentially skip many partitions, leading to significantly faster query execution and reduced resource consumption. 

Excellent! We’re done! Or are we?

Handling Partition Overlap

In Hydrolix’s massively parallel environment, partitions are allowed to overlap, a design choice that offers flexibility and resilience in data management. However, this feature introduces an additional layer of complexity when optimizing queries. Consider the next diagram, which shows a scenario where time ranges overlap.

The GIF shows two partitions with overlapping timestamps. The query peers have found matching rows from the first partition that are more recent than the maximum timestamp of the second partition. However, the query peer doesn't have enough information to skip the second partition.

Here, the time ranges of both partitions intersect. Even though the query peer has already retrieved 5 rows from partition 1 before partition 2’s max timestamp of 10:50 AM and it’s clear to us based on the diagram that partition 2 can be skipped, the query peer can’t just rely on the minimum timestamp to skip partition 2. It needs additional information about the rows it has retrieved.

To solve this issue, we need a better algorithm than using the minimum of the matched rows. So how do we proceed?

Using Bounded Priority Queues

We solved this issue using a bounded priority queue, which has the following useful properties: it consistently maintains the largest N elements and efficiently returns the smallest of these elements in constant time. When a new element is introduced, it replaces the smallest in the queue if the new element is larger.

GIF of a bounded priority queue with 5 items. As higher max values are added, the lowest value is pushed out.

By using a bounded priority queue, the Hydrolix query engine can efficiently keep track of only the most recent matching rows. With this optimization, the query peer can compare the minimum timestamp in the priority queue with the maximum timestamp of a partition. If the partition’s max timestamp is less than this minimum, the partition can’t contain any rows that will be more recent than the rows the query peer has already found, which means it can be skipped.

Let’s take a look at the previous illustration of two overlapping partitions again. With a bounded priority queue, the query peer stores only the 5 most recent timestamps from between 10:55 AM and 10:59 AM. It can compare the minimum timestamp from the queue with the maximum timestamp of partition 2 to determine that it can skip partition 2.

The GIF shows two partitions with overlapping timestamps. The query peers have found matching rows from the first partition that are more recent than the maximum timestamp of the second partition. With a bounded priority queue, the second partition doesn't need to be processed.

Scenarios Where Partitions Can’t Be Skipped

Sometimes the query peer must read the full partition. Here are some of those scenarios:

Perfectly Overlapping Partitions

GIF shows example of four partitions that completely overlap. All of the partitions must be queried.

Since every partition spans from 8am to 11am, each partition might contain more recent rows than the previous. The query peer must process every partition to find the most recent 5 rows.

Max Timestamp Above Query Bounds

GIF shows example of four partially overlapping partitions. The max timestamp of each partition is above the query bounds, so all of the partitions needs to be processed.

Let’s take a look at an example of a query with a primary filter:

In the previous illustration, all four partitions have max timestamps that are greater than our query’s upper time bound. The query peer will never find rows matching the filter that are greater than the max timestamp of any of these partitions. All rows matching the filter must be between 8:45 and 8:10am, so all partitions must be read entirely.

Needle in a Haystack Query

It’s also possible that there are very few rows matching our filter. In this case we must continue searching until we’ve met the LIMIT—in this case, 5 rows meeting the query condition. For example, if the query peer has only found 4 rows that meet the condition, it must search all partitions until it’s found at least 5 before it can start comparing the min timestamp of the priority queue against the max timestamp of a partition.

GIF shows example of a needle in a haystack query. There are four overlapping partitions and each partition has very few rows that match the query (four total across the partitions), so the bounded priority queue is never filled.

Minimizing the Side Effects of Worst Case Scenarios

If we can’t optimize, we can still be efficient.

Typically, we experience significant performance gains by skipping a large number of partitions. However, sometimes optimization isn’t possible. In these situations, the goal is to ensure that optimization efforts don’t have side effects like decreased query performance.

Hydrolix works at a massively parallel scale, grabbing far more than one partition at a time. To compensate for potential issues, we have a novel approach to minimizing the adverse effects of locking. Every block that passes through our system is tagged with the current maximum timestamp to beat. Threads individually compete to find the next skippable partition without waiting for each other. The next diagram illustrates this process.

GIF shows multiple threads receiving blocks of information that are stamped with the maximum timestamp to beat. One of the threads finds a block that exceeds the maximum timestamp.

Conclusion: Optimizing with Precision and Minimalism

By leveraging bounded priority queues and concurrency-aware techniques, we  maximize performance through intelligent partition-skipping and minimize the side effects when direct optimization paths are obscured.

Ultimately, the Hydrolix query engine’s sophisticated design principles reflect an overarching theme: optimizing not just for the best-case scenarios but also ensuring graceful handling of the most demanding queries. This dual focus on performance and reliability ensures that Hydrolix remains at the cutting edge of data processing technology, providing rapid, reliable results across all scales of operation.

For Hydrolix users: To verify if your query utilized this optimization, you can check the limit_optimization section in the query log. If for any reason you’d like to disable this optimization, you can do so using the query setting hdx_query_optimize_order_by_primary = false.

Next Steps

Learn more about Hydrolix and contact us for a POC.

Interested in working at Hydrolix? See our available roles.