RSS

High performance dashboard with Grafana

In this blog post we’ll see how to leverage Clickhouse plugin and Grafana to create really performant dashboard on Hydrolix data platform. Dashboard Variables Grafana has a concept of variable built-into the dashboard, those variables can then be used to filter the data. We recommend limiting those variables to either fixed value pre-defined in advance,…

David Sztykman

Published:

Nov 03, 2022

4 minute read
, ,

In this blog post we’ll see how to leverage Clickhouse plugin and Grafana to create really performant dashboard on Hydrolix data platform.

Dashboard Variables

Grafana has a concept of variable built-into the dashboard, those variables can then be used to filter the data. We recommend limiting those variables to either fixed value pre-defined in advance, or limit the value selection to the timeframe of the dashboard.

Lets use HTTP status codes as an example, we know in advance all the status code that exists, we don’t necessarily need to query the data in Hydrolix to get those. What we can do is the following query:

This will generate a list of HTTP status codes directly in the variable:

This concept is really important and interesting, as it avoids queries on cloud storage to get those values.

Indeed if we were doing query on the column status_code it would requires downloading part of the partition to list the different status code. With that example we are leveraging only local data to generate the list of http status code.

If you need to have some variable values defined by querying the Hydrolix data (for example User-Agent string) you have to limit the selection of those values to the time range of your dashboard. Otherwise this will generate a full column scan which requires opening every partition and is very expensive (not to mention offer users values outside of the time bounds).

Here we are leveraging the built-in macro $__timeFilter (for seconds resolution), $__timeFilter_ms (for milliseconds resolution). This built-in macro will automatically add the time range filtering using the column specified.

In my example we are using a timestamp column name and it’s stored in milliseconds.

This example also avoids using SELECT DISTINCT but uses GROUP BY instead, it’s more efficient and less memory consuming.

Last example is using a dictionary file to SELECT values. If you are using dictionary with Hydrolix you can specify the filename uploaded to your cluster, you can then leverage the following statement to open the dictionary file and use it to select value:

In this example it means we have uploaded a dictionary file called country.csv and it’s a CSVWithNames that Hydrolix is reading locally. This approach is very useful when you have a lot of values, as we don’t open data from cloud storage and dictionary are in memory.

Improve your SQL statement

Do not include variable not used in your SQL Statement, otherwise it’ll impact the performance of your charts.

If you use a drop down filter variable with multiple elements you can use the built-in variable $__conditionalAll
It’ll check that the variable is All, if all is selected it’ll then replace the statement with 1=1.

If you don’t leverage this function then the SQL statement generated will automatically include all the possible value from the filter, which makes it very difficult to read and expensive.

Example of usage:

This example means that if the users SELECT all the http status code I won’t have any filters, the SQL statement generated will be:

If the variable is selecting 200 and 304 status code then the statement will be rewritten as:

Unfortunately text-box don’t work with $__conditionalAll, so if you want to use those a free text search and filter with a regex or LIKE function for example you need to use the following in your query:

With the following text box:

If the value of the text box is empty we are not applying the SQL statement session_id LIKE '%${session_id:text}%’.
If the value is different than empty then we’ll apply the SQL statement.

A full query example would be something like:

This leverage all the different variables define above.

Add username in query settings

Hydrolix provides a real time view of the query running in the cluster, we can add extra information to the query running by specifying extra information in the SETTINGS of the query.

In this view you can view the query running and also the username in the admin_comment section. To add the username automatically you can modify your SQL statement and append the following:


This will get the login used by the user and add that information as an admin comment which is then displayed in the active query section.

So to summarise your query for your chart should look like the following:

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

In this blog post we’ll see how to leverage Clickhouse plugin and Grafana to create really performant dashboard on Hydrolix data platform.

Dashboard Variables

Grafana has a concept of variable built-into the dashboard, those variables can then be used to filter the data. We recommend limiting those variables to either fixed value pre-defined in advance, or limit the value selection to the timeframe of the dashboard.

Lets use HTTP status codes as an example, we know in advance all the status code that exists, we don’t necessarily need to query the data in Hydrolix to get those. What we can do is the following query:

This will generate a list of HTTP status codes directly in the variable:

This concept is really important and interesting, as it avoids queries on cloud storage to get those values.

Indeed if we were doing query on the column status_code it would requires downloading part of the partition to list the different status code. With that example we are leveraging only local data to generate the list of http status code.

If you need to have some variable values defined by querying the Hydrolix data (for example User-Agent string) you have to limit the selection of those values to the time range of your dashboard. Otherwise this will generate a full column scan which requires opening every partition and is very expensive (not to mention offer users values outside of the time bounds).

Here we are leveraging the built-in macro $__timeFilter (for seconds resolution), $__timeFilter_ms (for milliseconds resolution). This built-in macro will automatically add the time range filtering using the column specified.

In my example we are using a timestamp column name and it’s stored in milliseconds.

This example also avoids using SELECT DISTINCT but uses GROUP BY instead, it’s more efficient and less memory consuming.

Last example is using a dictionary file to SELECT values. If you are using dictionary with Hydrolix you can specify the filename uploaded to your cluster, you can then leverage the following statement to open the dictionary file and use it to select value:

In this example it means we have uploaded a dictionary file called country.csv and it’s a CSVWithNames that Hydrolix is reading locally. This approach is very useful when you have a lot of values, as we don’t open data from cloud storage and dictionary are in memory.

Improve your SQL statement

Do not include variable not used in your SQL Statement, otherwise it’ll impact the performance of your charts.

If you use a drop down filter variable with multiple elements you can use the built-in variable $__conditionalAll
It’ll check that the variable is All, if all is selected it’ll then replace the statement with 1=1.

If you don’t leverage this function then the SQL statement generated will automatically include all the possible value from the filter, which makes it very difficult to read and expensive.

Example of usage:

This example means that if the users SELECT all the http status code I won’t have any filters, the SQL statement generated will be:

If the variable is selecting 200 and 304 status code then the statement will be rewritten as:

Unfortunately text-box don’t work with $__conditionalAll, so if you want to use those a free text search and filter with a regex or LIKE function for example you need to use the following in your query:

With the following text box:

If the value of the text box is empty we are not applying the SQL statement session_id LIKE '%${session_id:text}%’.
If the value is different than empty then we’ll apply the SQL statement.

A full query example would be something like:

This leverage all the different variables define above.

Add username in query settings

Hydrolix provides a real time view of the query running in the cluster, we can add extra information to the query running by specifying extra information in the SETTINGS of the query.

In this view you can view the query running and also the username in the admin_comment section. To add the username automatically you can modify your SQL statement and append the following:


This will get the login used by the user and add that information as an admin comment which is then displayed in the active query section.

So to summarise your query for your chart should look like the following: