RSS

4 Observability “Best Practices” That Are Really Just Data Loss

Learn about four observability anti-practices offered as solutions to save money—when the real issue is that most platforms cost too much.

Franz Knupfer

Published:

Mar 05, 2024

6 minute read

If data is the “new oil,” why is so much of it treated like old garbage? Practices like sampling and data aggregation are touted as observability “best practices.” However, the reality is far more simple: they involve throwing your data away.

The real problem is skyrocketing volumes of log data and legacy observability platforms that weren’t built with petabyte scale volumes of data in mind. Many enterprises have to choose between astronomical costs or losing data.

Unfortunately, incumbent observability platforms have too much technical debt to easily lift-and-shift from costly legacy data layers to cloud-based, scalable, and cost-effective object storage. So the only way to drive down costs is to reduce the amount of data you ingest, store, and query. The issue of data loss has been repackaged as a series of best practices, which doesn’t solve the underlying problem.

Let’s take a look at four observability “best practices” that involve throwing your data away—and the downsides of each.

Transforming the Economics of Log Management with Next-Gen Cloud Data Platforms for Observability Use Cases

Learn why most observability platforms cost too much, and how modern cloud data platforms must push the limits of object storage to offer long-term, cost-effective log data.

Limited Retention Windows

High performance “hot” storage can be very costly with legacy observability platforms. As a result, customers have to adopt short data retention windows to reduce costs. You get performant query capacity, but only for a few months at most. Many, if not most,  high-volume users retain data in hot storage for just a few weeks or even days. Then the data is either discarded or moved into “cold” storage where it’s difficult to access.

The reasoning is that recent data is most important because you can use it to quickly detect and fix problems before they impact end users. That’s a great rationale for keeping recent data, but it’s not a good one for discarding older data. In addition to being a rich trove of information for data science and AI use cases, it could contain valuable historical insights, information about undetected security breaches, and answers to business questions you haven’t thought to ask yet.

But wait… isn’t the best practice to move data into more affordable cold storage, not just throw it away? That way, you can keep your data and pay less for it. However, data in cold storage must first be rehydrated before queries, which rarely happens. The data becomes dark data that’s kept only for security and compliance purposes. You’re effectively throwing it away… except you’re paying for long-term storage (just in case) before you throw it away for good a few years down the road. 52% of the average data storage budget goes to dark data. That’s a lot to spend on data that’s not actionable.

Sampling

Another approach to limiting incoming data is sampling, sometimes called decimation. This involves keeping a percentage of the data and discarding the rest. To make matters worse, sampling is often randomized, and there’s no way to know which data is being discarded. Does it contain information about a low-and-slow attack from a malicious actor? Logs that would reveal the root cause of an outage or breach? Or maybe repeat visits from an IP address which suggests that your video content is being repackaged and pirated?

There are two major reasons why sampling is common. The first is to control costs—there will be lower storage and compute costs simply because there is less data. The second is that some incumbent observability platforms simply can’t handle terabyte volumes of real-time data without major performance degradation. If you have to sample data to make your observability solution affordable and performant, it’s probably time to find a new solution.

Limiting Dimensionality

High dimensionality data can have hundreds of attributes. These attributes provide additional context, but they can also increase storage costs and degrade query performance for users of many traditional observability platforms. The reason for increased storage costs is simple. More attributes means more data. Meanwhile, observability solutions that use row-based databases will also have performance issues with high dimensionality data. This is because a row-based approach must do full table scans, making tables inefficient to query.

To “solve” this problem, the onus is on end users to limit the dimensionality of their data. Because of design limitations in traditional observability solutions, users must compromise the granularity of their data. Which of your teams gets the fun task of sifting through hundreds of columns and deciding which ones are hopefully not important? Trimming a few columns may not be so challenging, but what happens when you need to make hard choices about which data attributes to keep? And what happens when you need that data in the future and it’s gone?

There are valid reasons you may not want to retain some data attributes, but it shouldn’t just be a required cost-cutting measure.

Data Aggregation

Dashboards that aggregate data can be very helpful. You get a bird’s eye view of system metrics on performance, errors, number of requests, and more. Then ideally you can drill down into individual metrics to see what’s going on in each part of your system.

However, it’s much more problematic when observability platforms aggregate data into generalized metrics and then discard the underlying data. You lose the granularity of that data and you can’t drill down anymore. Data aggregation can give you a basic overview of application performance six months ago, but you can’t dive deeper to understand the how and the why. If you have a hypothesis, you will not be able to prove it. And if you have deeper business questions about the data, you will not be able to answer them.

Data aggregation gives you a very narrow slice of generalized data. What happens when you realize later that you want your data sliced a different way? You need the underlying data to make that aggregation. If it’s gone, you won’t be able to look at historical data from different angles and gain a fresh perspective.

The Real Best Practice: Long-Term, Cost-Effective Hot Data

Each of these so-called best practices push the shortcomings of an observability platform onto the end user, then repackages them as features instead of bugs. If you’re using an observability solution that can’t manage your data in a cost-effective manner, it’s time to rethink your solution.

Modern cloud data platforms like Hydrolix use commodity object storage that’s pay-as-you-go and horizontally scalable, giving you long-term, cost-effective storage. Hydrolix maximizes the strengths of object storage and uses massive parallelism, high-density compression, and advanced query features such as micro-indexing to ensure that ingest and query are performant at any scale.

There are reasons you might want to use the practices outlined in this post for certain data use cases, but if you’re forced to do it because your observability platform costs too much, it’s not a best practice. You shouldn’t have to run up your bill just because your platform can’t pay down its technical debt.

Next Steps

Read Transforming the Economics of Log Management to learn about the issues facing many of today’s observability platforms, and how next-generation cloud data platforms must maximize the benefits of object storage to make log data cost-effective.

Learn more about how Hydrolix offers cost-effective data at terabyte scale and contact us for a POC.

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

If data is the “new oil,” why is so much of it treated like old garbage? Practices like sampling and data aggregation are touted as observability “best practices.” However, the reality is far more simple: they involve throwing your data away.

The real problem is skyrocketing volumes of log data and legacy observability platforms that weren’t built with petabyte scale volumes of data in mind. Many enterprises have to choose between astronomical costs or losing data.

Unfortunately, incumbent observability platforms have too much technical debt to easily lift-and-shift from costly legacy data layers to cloud-based, scalable, and cost-effective object storage. So the only way to drive down costs is to reduce the amount of data you ingest, store, and query. The issue of data loss has been repackaged as a series of best practices, which doesn’t solve the underlying problem.

Let’s take a look at four observability “best practices” that involve throwing your data away—and the downsides of each.

Transforming the Economics of Log Management with Next-Gen Cloud Data Platforms for Observability Use Cases

Learn why most observability platforms cost too much, and how modern cloud data platforms must push the limits of object storage to offer long-term, cost-effective log data.

Limited Retention Windows

High performance “hot” storage can be very costly with legacy observability platforms. As a result, customers have to adopt short data retention windows to reduce costs. You get performant query capacity, but only for a few months at most. Many, if not most,  high-volume users retain data in hot storage for just a few weeks or even days. Then the data is either discarded or moved into “cold” storage where it’s difficult to access.

The reasoning is that recent data is most important because you can use it to quickly detect and fix problems before they impact end users. That’s a great rationale for keeping recent data, but it’s not a good one for discarding older data. In addition to being a rich trove of information for data science and AI use cases, it could contain valuable historical insights, information about undetected security breaches, and answers to business questions you haven’t thought to ask yet.

But wait… isn’t the best practice to move data into more affordable cold storage, not just throw it away? That way, you can keep your data and pay less for it. However, data in cold storage must first be rehydrated before queries, which rarely happens. The data becomes dark data that’s kept only for security and compliance purposes. You’re effectively throwing it away… except you’re paying for long-term storage (just in case) before you throw it away for good a few years down the road. 52% of the average data storage budget goes to dark data. That’s a lot to spend on data that’s not actionable.

Sampling

Another approach to limiting incoming data is sampling, sometimes called decimation. This involves keeping a percentage of the data and discarding the rest. To make matters worse, sampling is often randomized, and there’s no way to know which data is being discarded. Does it contain information about a low-and-slow attack from a malicious actor? Logs that would reveal the root cause of an outage or breach? Or maybe repeat visits from an IP address which suggests that your video content is being repackaged and pirated?

There are two major reasons why sampling is common. The first is to control costs—there will be lower storage and compute costs simply because there is less data. The second is that some incumbent observability platforms simply can’t handle terabyte volumes of real-time data without major performance degradation. If you have to sample data to make your observability solution affordable and performant, it’s probably time to find a new solution.

Limiting Dimensionality

High dimensionality data can have hundreds of attributes. These attributes provide additional context, but they can also increase storage costs and degrade query performance for users of many traditional observability platforms. The reason for increased storage costs is simple. More attributes means more data. Meanwhile, observability solutions that use row-based databases will also have performance issues with high dimensionality data. This is because a row-based approach must do full table scans, making tables inefficient to query.

To “solve” this problem, the onus is on end users to limit the dimensionality of their data. Because of design limitations in traditional observability solutions, users must compromise the granularity of their data. Which of your teams gets the fun task of sifting through hundreds of columns and deciding which ones are hopefully not important? Trimming a few columns may not be so challenging, but what happens when you need to make hard choices about which data attributes to keep? And what happens when you need that data in the future and it’s gone?

There are valid reasons you may not want to retain some data attributes, but it shouldn’t just be a required cost-cutting measure.

Data Aggregation

Dashboards that aggregate data can be very helpful. You get a bird’s eye view of system metrics on performance, errors, number of requests, and more. Then ideally you can drill down into individual metrics to see what’s going on in each part of your system.

However, it’s much more problematic when observability platforms aggregate data into generalized metrics and then discard the underlying data. You lose the granularity of that data and you can’t drill down anymore. Data aggregation can give you a basic overview of application performance six months ago, but you can’t dive deeper to understand the how and the why. If you have a hypothesis, you will not be able to prove it. And if you have deeper business questions about the data, you will not be able to answer them.

Data aggregation gives you a very narrow slice of generalized data. What happens when you realize later that you want your data sliced a different way? You need the underlying data to make that aggregation. If it’s gone, you won’t be able to look at historical data from different angles and gain a fresh perspective.

The Real Best Practice: Long-Term, Cost-Effective Hot Data

Each of these so-called best practices push the shortcomings of an observability platform onto the end user, then repackages them as features instead of bugs. If you’re using an observability solution that can’t manage your data in a cost-effective manner, it’s time to rethink your solution.

Modern cloud data platforms like Hydrolix use commodity object storage that’s pay-as-you-go and horizontally scalable, giving you long-term, cost-effective storage. Hydrolix maximizes the strengths of object storage and uses massive parallelism, high-density compression, and advanced query features such as micro-indexing to ensure that ingest and query are performant at any scale.

There are reasons you might want to use the practices outlined in this post for certain data use cases, but if you’re forced to do it because your observability platform costs too much, it’s not a best practice. You shouldn’t have to run up your bill just because your platform can’t pay down its technical debt.

Next Steps

Read Transforming the Economics of Log Management to learn about the issues facing many of today’s observability platforms, and how next-generation cloud data platforms must maximize the benefits of object storage to make log data cost-effective.

Learn more about how Hydrolix offers cost-effective data at terabyte scale and contact us for a POC.