Ponderings, insights and industry updates

Eating our own logs

Published: October 12, 2022

Updated: October 12, 2022

Author: David Sztykman

Tags: , ,

Monitoring complex application deployed via Kubernetes can be a challenge especially when the application uses a large number of components.

The Hydrolix platform is designed to make log processing and filtering easy. It therefore is an ideal candidate to eat its own dog food, or in this case – logs.

In our latest release, we automatically index our own logs into the platform and Grafana dashboards are published to help users improve observability of our stack.

Components within Hydrolix are mostly stateless and workload independent so having a good view of systems is important. Components include:

Logs are not the end all and be all, Hydrolix also provides prometheus and other metrics, information on these can be found within our docs – here.

Getting logs from K8s

When Hydrolix is deployed, Vector is provisioned with each node as a daemonset. More details about deploying Vector on K8s can be found here.

Every pods deployed by the Hydrolix operator is configured to use the k8s logging system with vector configured to get logs from k8s and add metadata.

This means that each log line contains the k8s annotation, podname, namespace etc…

Logs are both written into Cloud Storage (S3, GCS etc) as well as indexing them into a Hydrolix table – hydro.logs

This is done by sending logs from vector directly into a redpanda queueing system that is part of our core platform. Logs have their own topic within Redpanda and are pulled via a built in Kafka service that writes the data base entries.

By using this kind of methodology log latency takes up to 10 seconds to become available within the database.

There is an added benefit to keeping these logs within Hydrolix. Due to the patented indexing and compression techniques Hydrolix employs in benchmarks we’ve seen compression ratios of 52:1.

Hydrolix compression ratio 52

To put this into context:

In one of our test systems we currently have 65.16 million logs lines.

This equates to a raw data size of around 73.58 GB

When imported into Hydrolix the data and index compress to a total of ~1.5GB!

To extrapolate this means that if I store say 1TB of uncompressed log data a day in a production system, this will equate to around 40GB a day of Hydrolix stored data or I can keep 52 Days of data for the price of 1!

Storing is great, but logs need to be useful!

Hydrolix index’s everything by default, any information included in the logs is available to query. Add in the full text search capability, allowing the filtering to match specific keywords within messages opens up the ability to do fast detailed searches into stack traces, messages or exceptions.

To help with this, we have created a few simple Grafana dashboards that are available for our customers to use:

Grafana Logs Debugging session

In the next blog post we’ll see how we use our logs to improve query monitoring and performance

Share Now