RSS

Setting Up Query Circuit Breakers

Learn how to use query circuit breakers to limit wasteful query compute and reduce your overall cloud spend.

Franz Knupfer

Published:

Mar 19, 2024

4 minute read
,

When you’re working with big data, you can quickly run up a huge cloud bill with compute-intensive queries. There are plenty of horror stories about astronomical bills like the $14,000 BigQuery charge from a script that ran over a couple of hours.

Hydrolix is a powerful tool for efficient, sub-second queries of datasets that contain tens of billions of rows. However, inefficient queries can still add up over time and bloat your cloud spend. Fortunately, there are plenty of safeguards you can put in place to reduce wasteful queries, including circuit breakers.

Recent blogs covered how you can use precision query scaling and custom views in Hydrolix. Precision query scaling allows you to separate query usage into different query pools and manage the amount of compute that each pool uses. With precision query scaling, you can prevent resource contention and limit compute costs. Meanwhile, you can use custom views to specify which columns queries should use, meaning that a SELECT * will be limited to those columns instead of querying every column in a high dimensionality database.

In this blog, you’ll learn about another technique for limiting wasteful query compute: circuit breakers. A circuit breaker cancels a query if it exceeds a specified limit. For example, you can set circuit breakers for maximum rows, maximum size for a query result in bytes, or maximum time range for a query.

Setting Circuit Breakers at the System Level

There are multiple ways to set a circuit breaker on a query. At the system level, a circuit breaker will apply to all queries. You can also add a circuit breaker at the query level, either through the HTTP header, query string parameters, or even in the SQL statement itself.

To set a circuit breaker at the system level, you can use the Hydrolix API. By default, Hydrolix doesn’t set any circuit breakers. However, for most big data use cases, it’s highly recommended to enable the following circuit breaker:

This circuit breaker automatically cancels a query that doesn’t include a timestamp filter. Hydrolix is optimized for time series data, which means data is partitioned by minimum and maximum timestamps. A time-filtered query uses partition pruning to automatically remove most Hydrolix partitions from query consideration. Time-filtered queries in Hydrolix are extremely performant and use much fewer compute resources.

On the other hand, a query that doesn’t include a timestamp will open every partition in your database. Even with other optimizations in place like column pruning, it’s a terribly inefficient query on a large dataset. Not only is it wasteful in terms of compute resources, but it’s challenging to get meaningful insights if you query too much data.

Fortunately, you can change this setting with a quick API call. If you need to override a circuit breaker in an individual query, you can do so by specifying the circuit breaker setting at the query level.

To update the setting, simply make a PUT request to the following endpoint https://{hostname}/config/v1/orgs/{org_id}/query_options/  and add this JSON to the body of the API call:

With this setting on, a query without a time filter will return the following error: HdxStorageError hdx_query_timerange_required is set to true.

Circuit Breaker Best Practices

What about other query settings? Here are some additional considerations to keep in mind to reduce wasteful query compute.

  • Raw tables can have a lot of data, so you might want to include a hdx_query_max_timerange_sec to ensure that users aren’t retrieving data over long periods of time, which would result in large results. Instead, query summary tables when you need data from long time periods. Summary tables hold much less data and are more efficient to query. You can override the system-level settings at the query level for summary table queries.
  • You can’t rely on LIMIT to reduce the number of results in a table. Just because you add LIMIT 100 to a query and only retrieve 100 rows doesn’t mean that Hydrolix isn’t first processing billions of rows before limiting that data for you. Instead, you can use the circuit breaker hdx_query_max_rows to enforce limits.
  • Some dashboard tools like Superset allow you to automatically append a settings string to a query. This can be particularly useful for creating more granular settings across different kinds of queries. For example, you might append a string to summary table queries that overrides hdx_query_max_timerange_sec.
  • If your system-level circuit breakers are too restrictive for some use cases, it’s still a best practice to override the circuit breaker at the query level on a case-by-case basis. This small additional step will ensure that these queries happen by design, not by accident.

Ultimately, it will likely take trial and error to determine your circuit breaker settings. However, a few simple circuit breakers can potentially save your organization a lot of money, especially if you combine them with precision query scaling and custom views. Hydrolix is a powerful, efficient tool for analyzing big data, but you still need to safeguard against wasteful compute. Analyzing big data requires a different approach to querying, and cloud costs can quickly spiral if you aren’t being mindful.

Next Steps

Learn about other Hydrolix best practices for reducing query compute and cloud spend:

Read the circuit breaker docs.

Learn more about Hydrolix and contact us for a POC.

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

When you’re working with big data, you can quickly run up a huge cloud bill with compute-intensive queries. There are plenty of horror stories about astronomical bills like the $14,000 BigQuery charge from a script that ran over a couple of hours.

Hydrolix is a powerful tool for efficient, sub-second queries of datasets that contain tens of billions of rows. However, inefficient queries can still add up over time and bloat your cloud spend. Fortunately, there are plenty of safeguards you can put in place to reduce wasteful queries, including circuit breakers.

Recent blogs covered how you can use precision query scaling and custom views in Hydrolix. Precision query scaling allows you to separate query usage into different query pools and manage the amount of compute that each pool uses. With precision query scaling, you can prevent resource contention and limit compute costs. Meanwhile, you can use custom views to specify which columns queries should use, meaning that a SELECT * will be limited to those columns instead of querying every column in a high dimensionality database.

In this blog, you’ll learn about another technique for limiting wasteful query compute: circuit breakers. A circuit breaker cancels a query if it exceeds a specified limit. For example, you can set circuit breakers for maximum rows, maximum size for a query result in bytes, or maximum time range for a query.

Setting Circuit Breakers at the System Level

There are multiple ways to set a circuit breaker on a query. At the system level, a circuit breaker will apply to all queries. You can also add a circuit breaker at the query level, either through the HTTP header, query string parameters, or even in the SQL statement itself.

To set a circuit breaker at the system level, you can use the Hydrolix API. By default, Hydrolix doesn’t set any circuit breakers. However, for most big data use cases, it’s highly recommended to enable the following circuit breaker:

This circuit breaker automatically cancels a query that doesn’t include a timestamp filter. Hydrolix is optimized for time series data, which means data is partitioned by minimum and maximum timestamps. A time-filtered query uses partition pruning to automatically remove most Hydrolix partitions from query consideration. Time-filtered queries in Hydrolix are extremely performant and use much fewer compute resources.

On the other hand, a query that doesn’t include a timestamp will open every partition in your database. Even with other optimizations in place like column pruning, it’s a terribly inefficient query on a large dataset. Not only is it wasteful in terms of compute resources, but it’s challenging to get meaningful insights if you query too much data.

Fortunately, you can change this setting with a quick API call. If you need to override a circuit breaker in an individual query, you can do so by specifying the circuit breaker setting at the query level.

To update the setting, simply make a PUT request to the following endpoint https://{hostname}/config/v1/orgs/{org_id}/query_options/  and add this JSON to the body of the API call:

With this setting on, a query without a time filter will return the following error: HdxStorageError hdx_query_timerange_required is set to true.

Circuit Breaker Best Practices

What about other query settings? Here are some additional considerations to keep in mind to reduce wasteful query compute.

  • Raw tables can have a lot of data, so you might want to include a hdx_query_max_timerange_sec to ensure that users aren’t retrieving data over long periods of time, which would result in large results. Instead, query summary tables when you need data from long time periods. Summary tables hold much less data and are more efficient to query. You can override the system-level settings at the query level for summary table queries.
  • You can’t rely on LIMIT to reduce the number of results in a table. Just because you add LIMIT 100 to a query and only retrieve 100 rows doesn’t mean that Hydrolix isn’t first processing billions of rows before limiting that data for you. Instead, you can use the circuit breaker hdx_query_max_rows to enforce limits.
  • Some dashboard tools like Superset allow you to automatically append a settings string to a query. This can be particularly useful for creating more granular settings across different kinds of queries. For example, you might append a string to summary table queries that overrides hdx_query_max_timerange_sec.
  • If your system-level circuit breakers are too restrictive for some use cases, it’s still a best practice to override the circuit breaker at the query level on a case-by-case basis. This small additional step will ensure that these queries happen by design, not by accident.

Ultimately, it will likely take trial and error to determine your circuit breaker settings. However, a few simple circuit breakers can potentially save your organization a lot of money, especially if you combine them with precision query scaling and custom views. Hydrolix is a powerful, efficient tool for analyzing big data, but you still need to safeguard against wasteful compute. Analyzing big data requires a different approach to querying, and cloud costs can quickly spiral if you aren’t being mindful.

Next Steps

Learn about other Hydrolix best practices for reducing query compute and cloud spend:

Read the circuit breaker docs.

Learn more about Hydrolix and contact us for a POC.