RSS

Monitoring query performance

In the previous blog post eating our own logs we saw how Hydrolix cluster is indexing its own logs. Our logs contains lots of very useful information, in this blog post we’ll see how we can use those logs to create a query performance monitoring dashboard. Define KPI Before we start we need to think…

David Sztykman

Published:

Oct 21, 2022

4 minute read
, , ,

In the previous blog post eating our own logs we saw how Hydrolix cluster is indexing its own logs.

Our logs contains lots of very useful information, in this blog post we’ll see how we can use those logs to create a query performance monitoring dashboard.

Define KPI

Before we start we need to think a little bit about what are the KPI we need to monitor, some are pretty obvious:

  • Query rate
  • Error rate
  • Response time

But what does impact response time? What are the other informations we have in our logs to help us address performance?

Monitoring the number of partitions used by a query is important, looking at the bytes read, rows read and memory usage for a given query is also critical.

Let’s look at an example of query log indexed into Hydrolix:

As you can see our logs contains lots of detailed informations about the query performance.

Now let’s dive into Grafana and start building !

Creating the dashboard

Define dashboard filters

To be able to identify bad query performance we want to be able to filter through our logs very quickly.

We want to be able to quickly find queries that are using the most memory, partitions that have bad response time.

Grafana let you define variable, in our query performance monitoring dashboard we are using the following:

Grafana variable

Let’s take a look at the memory_usage_percentile:

It’s a custom type which list the different percentile we want to be able to filter on.

Now let’s see how we can use that variable into a SQL statement to actually find the value in question.

To do that we are creating a second Grafana variable which is hidden from the user and will do the SQL query to get the value:

SQL query to get memory_usage_percentile

Here we are creating a new variable called memory_usage_bytes which is running the SQL statement:

This SQL statement is leveraging our previous variable ${memory_usage_percentile} so when the user select the 0.95 value for example this will automatically calculate the 95th percentile of memory usage.

The same principle is being applied for the num_partitions_percentile but this time using the column num_partitions.

Response time is another example where we define the interval we want to filter on:

Response time filter

Here we define interval where the first part is what users will select and the second part will be the actual value for the variable response_time

Hydrolix query contains lots of different options such as admin comment, Grafana and other tool can leverage those to include information about users, see more details on our docs.
This allow our performance monitoring dashboard to be able to filter on specific users too.

User filtering variable

Here we are creating a new variable called user which is listing the hdx_query_admin_comment leveraging the following statement:

And finally we have free text search to look for specific pattern in query or errors, those are leveraging our full text search capability.

Create charts

Now that we have define all the criteria we want to use to filter our data, let’s create a chart leveraging all those variable.

Performance Chart

The first chart is actually counting the number of query, if it’s a success or an error. It’s leveraging the following statement:

This query leverages several built-in Grafana function

$__timeFilter_ms add the timerange filter from the dashboard.
$__timeInterval generate time interval to group by the data.
$__conditionalAll replace the value with 1=1 if the variable is set to All

So in our example we are leveraging an interesting principle:
AND $__conditionalAll(memory_usage_bytes >= $memory_usage_bytes, $memory_usage_percentile)

This means that if the variable $memory_usage_percentile is All we do nothing, but if the variable is set to something else then we are running the sql statement:
AND memory_usage_bytes >= $memory_usage_bytes
$memory_usage_bytes is our hidden variable which calculate automatically the memory_usage_bytes percentile.

Another interesting concept is the following:
AND if('${query_search:text}' = '', true, query LIKE '%${query_search}%')

This time we leverage if statement from SQL, so if the variable ${query_search:text} is empty the condition will be true and won’t run the SQL statement LIKE '%${query_search}%'.
But if the variable is not empty the evaluation won’t be true anymore and we will run the LIKE '%${query_search}%'.

Let’s see it in action

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

In the previous blog post eating our own logs we saw how Hydrolix cluster is indexing its own logs.

Our logs contains lots of very useful information, in this blog post we’ll see how we can use those logs to create a query performance monitoring dashboard.

Define KPI

Before we start we need to think a little bit about what are the KPI we need to monitor, some are pretty obvious:

  • Query rate
  • Error rate
  • Response time

But what does impact response time? What are the other informations we have in our logs to help us address performance?

Monitoring the number of partitions used by a query is important, looking at the bytes read, rows read and memory usage for a given query is also critical.

Let’s look at an example of query log indexed into Hydrolix:

As you can see our logs contains lots of detailed informations about the query performance.

Now let’s dive into Grafana and start building !

Creating the dashboard

Define dashboard filters

To be able to identify bad query performance we want to be able to filter through our logs very quickly.

We want to be able to quickly find queries that are using the most memory, partitions that have bad response time.

Grafana let you define variable, in our query performance monitoring dashboard we are using the following:

Grafana variable

Let’s take a look at the memory_usage_percentile:

It’s a custom type which list the different percentile we want to be able to filter on.

Now let’s see how we can use that variable into a SQL statement to actually find the value in question.

To do that we are creating a second Grafana variable which is hidden from the user and will do the SQL query to get the value:

SQL query to get memory_usage_percentile

Here we are creating a new variable called memory_usage_bytes which is running the SQL statement:

This SQL statement is leveraging our previous variable ${memory_usage_percentile} so when the user select the 0.95 value for example this will automatically calculate the 95th percentile of memory usage.

The same principle is being applied for the num_partitions_percentile but this time using the column num_partitions.

Response time is another example where we define the interval we want to filter on:

Response time filter

Here we define interval where the first part is what users will select and the second part will be the actual value for the variable response_time

Hydrolix query contains lots of different options such as admin comment, Grafana and other tool can leverage those to include information about users, see more details on our docs.
This allow our performance monitoring dashboard to be able to filter on specific users too.

User filtering variable

Here we are creating a new variable called user which is listing the hdx_query_admin_comment leveraging the following statement:

And finally we have free text search to look for specific pattern in query or errors, those are leveraging our full text search capability.

Create charts

Now that we have define all the criteria we want to use to filter our data, let’s create a chart leveraging all those variable.

Performance Chart

The first chart is actually counting the number of query, if it’s a success or an error. It’s leveraging the following statement:

This query leverages several built-in Grafana function

$__timeFilter_ms add the timerange filter from the dashboard.
$__timeInterval generate time interval to group by the data.
$__conditionalAll replace the value with 1=1 if the variable is set to All

So in our example we are leveraging an interesting principle:
AND $__conditionalAll(memory_usage_bytes >= $memory_usage_bytes, $memory_usage_percentile)

This means that if the variable $memory_usage_percentile is All we do nothing, but if the variable is set to something else then we are running the sql statement:
AND memory_usage_bytes >= $memory_usage_bytes
$memory_usage_bytes is our hidden variable which calculate automatically the memory_usage_bytes percentile.

Another interesting concept is the following:
AND if('${query_search:text}' = '', true, query LIKE '%${query_search}%')

This time we leverage if statement from SQL, so if the variable ${query_search:text} is empty the condition will be true and won’t run the SQL statement LIKE '%${query_search}%'.
But if the variable is not empty the evaluation won’t be true anymore and we will run the LIKE '%${query_search}%'.

Let’s see it in action