Ponderings, insights and industry updates

Logs and metrics with Fluentbit

Published: October 3, 2022

Updated: October 7, 2022

Author: David Sztykman

Tags: , , ,

Fluent Bit is a super fast, lightweight, and highly scalable logging and metrics processor and forwarder.

In this article we’ll see how we can configure FluentBit to send metrics and logs into Hydrolix and how to visualise those into Grafana.

Deploying and configuring Fluent Bit

In this example I’ll setup FluentBit into AWS Linux EC2 machine. To do so I have followed this guide.

Installing on Linux is straightforward:

Once FluentBit is installed the configuration is pretty basic, I have enabled a couple of [INPUTS]

The FluentBit configuration file is on /etc/fluent-bit/fluent-bit.conf:

Example FluentBit Configuration

This configuration allows to retrieve cpu, memory, network, disk usage, systemd logs and AWS metadata information.
To keep the data cleaner I’m using the filter nest which allows the data to be nested and modified into a JSON blob.

Data example

We can index those leveraging the following transform:

Transform FluentBit

An important feature to note is that we are going to use Hydrolix’s new full text search capability.

This capability is applied to the MESSAGE json and allows the indexing of words within message.
The addition of this feature allows us to search for specific occurances of a word at query time without doing full column scans and ensuring we only get rows (or blocks) that include the keyword(s).
By using this feature for the MESSAGE column, it allows for faster query response times, better CPU utilisation and reduced bandwidth consumption.

By default our full text search create a separate index and we are splitting the column leveraging the following separator:

So a SQL query like the following will look for *error* in the log message:

Hydrolix leverages its own compression algorithm, even with fulltext search (and the additional string dictionaries in the indices that may involve), the system has excellent compression ratios:

  • “total_rows”: “266.55 million”
  • “raw_data_size”: “68.15 GiB” -> size of the data sent by fluentbit
  • “hdx_data_size”: “1.88 GiB” -> HDX data format
  • “hdx_index_size”: “570.80 MiB” -> HDX index data
  • “compression_ratio”: 27.9
Hydrolix Compression ratio

Grafana Visualisation

After deploying FluentBit into your infrastructure you can use Grafana for data visualisation and alerting.

Grafana FluentBit example

In this dashboard we are using the Clickhouse plugin for Grafana which allows us to write SQL statements and get their results. For more information on how to set-up Hydrolix with Grafana please have a look here

To create the above dashboard we create a new Dashboard within Grafana and configure the following 3 variables::

  • List of EC2 instance ID
  • List of network interface for the EC2 instance selected
  • And finally a text box to look for text pattern in the logs.
EC2 Instance ID List

This SQL query uses a built-in filter to limit the execution of the statement to the time range of the dashboard.

For example if your dashboard is set to the last 1h, and your time column is called timestamp the variable $__timeFilter(timestamp) will be replaced with:
WHERE timestamp >= '1664800456' AND timestamp <= '1664804056'

Network Interface

The second filter is to select the network interface, technically the network interface is a map(network_interface, value) so here we are retrieve all the network interface keys.

We use another built-in function to optimize any calls to the Database limiting columns to scan when ALL is selected – so:
AND $__conditionalAll(ec2_instance_id IN ( ${ec2_instance_id:singlequote} ), $ec2_instance_id)

Means that if my variable $ec2_instance_id is ‘all’ this filter will be replaced by AND 1=1.

If it has a value selected by the user then the query predicate will look like
ec2_instance_id IN ('$ec2_instance_id')

For example to use these settings in a query that retrieves the avg cpu usage of every host, we would specificy it as:

CPU Usage Example

Here in this query is getting the avg cpu usage of every host unless the user drill-down to specific EC2 instance.

And finally we need to create the fulltest search for the MESSAGE column:

Logs with full text search

The SQL Statement above retrieves all the MESSAGE data. We use If Clickhouse function to optimise the search and don’t apply the predicate if the default “all” string is supplied.

AND if('${log:text}' = 'all', true, MESSAGE LIKE '%${log:text}%')

This means that if the variable log contains the chain of characters that is equal to ‘all’ we do not apply the rest of the statement, i.e. MESSAGE LIKE '%${log:text}%' is ignored.
This is so we don’t search the MESSAGE for ‘all’.
However, if the value is something other than ‘all’ we apply the predicate MESSAGE LIKE ‘the text in the box’.

For example, if we want to search for 'error'

This provides us the output of a fulltext search looking for ‘error’ in the message field.
By using Hydrolix’s fulltextsearch capabilities these kinds data interactions (e.g. debugging or analysis of log data) become significantly faster and more efficient than they have previously.

Share Now