RSS

Sustainable Data Practices for Building a SaaS Platform

Learn best practices for building sustainability into your SaaS platform while lowering your total cloud spend.

Franz Knupfer

Published:

Jan 02, 2024

9 minute read

Sustainability isn’t just a best practice, it’s a necessary practice. Setting aside the scientific necessity of remaining below critical climate thresholds, sustainability is now being built into regulations such as the European Corporate Sustainability Due Diligence Directive (CSDDD), which requires large enterprises to have a climate transition plan and mitigate their environmental impact. And most large enterprises now have ESG (environmental, sustainability, and governance) directives to help guide them—and prove to customers that they’re doing their part.

If you’re building a SaaS platform with scalability in mind, how do you work towards sustainability when the goal is to keep growing your customer base, and with it, the amount of data in your systems? And how can you mitigate the environmental impact of your applications if you’re relying on cloud providers to power your applications? In this post, you’ll learn about steps to reduce the carbon footprint of your company, including building efficiency into your data layer.

Cloud Computing Is Energy Intensive

When it comes to cloud computing, sustainability can be challenging to visualize. Cloud compute and storage uses hardware and servers that are often thousands of miles away. But cloud computing consumes a vast amount of energy and creates more carbon than the airline industry. That energy comes from the hardware used to compute, transfer, and store information in the cloud as well as the air conditioning needed to cool that hardware. It includes redundancy in the form of backup servers and energy sources to ensure that your data is always online. On top of that, enterprises and consumers are ingesting and creating more data than ever before.

While cloud providers are transitioning to renewable energy sources and using innovative approaches to make efficient chips and cooling techniques that use less energy, when you’re building on top of cloud infrastructure, you can’t directly tie your usage to a specific renewable resource. The cloud’s ubiquitousness—its ability to draw always-available resources from just about anywhere—is both what makes it so powerful but also difficult to track precisely. As a result, it’s easy to forget just how resource-intensive cloud computing is—and it can be hard to identify and fix inefficiencies in your system.

Lowering Your Carbon Footprint in the Cloud

Most enterprises use cloud infrastructure for scalability, and you can’t choose which energy sources your applications use (at least not yet). Unless you build a data center on top of renewable resources like solar or geothermal, cloud infrastructure is a big improvement over on-prem because cloud providers use efficiencies of scale to drive innovation in cooling, hardware, and more. On the flip side, you don’t have control over the hardware. That means the main lever you have to pull is efficiency, and the goal is to use as few resources as possible. The good news is that using fewer resources is also good for the bottom line.

Cost is a loose proxy for sustainability.

In his 2023 AWS re:Invent keynote on the frugal architect, Dr. Werner Vogels, CTO of Amazon, said that sustainability should be a priority for all companies—and that cost is a proxy for sustainability. Cloud provider costs are usage-based, so reducing costs is synonymous with reducing usage—which means that efforts to improve the bottom line can also boost sustainability. This is in contrast to the misperception that sustainability is both difficult and expensive to implement. Tie sustainability to your bottom line, because reducing cloud spend is good for both the planet and your budget.

Monitor your resource usage.

In order to understand how many resources you’re using, you need to measure it in the first place. It’s similar to having a visible thermostat in your house, an analogy that Dr. Vogels made in his re:Invent keynote. If the thermostat is visible, you’ll be aware of your energy consumption and use less energy. Conversely, if it’s not visible, you’ll use more energy—and in the example that Dr. Vogels used (homes in Amsterdam) the difference in energy consumption between homes that have visible thermostats versus those that have thermostats in the basement is dramatic.

In distributed systems that use many services, often across multiple hosts and cloud providers, it can ber challenging to get a holistic view of the resources your system is using. Utilize cloud provider dashboards as well as observability and log monitoring solutions to get a complete picture of resource usage throughout your system. By doing so, you can identify services that are using a disproportionate amount of resources and target inefficiencies.

As an example, you could compare overall requests and throughput against the cost of a service. You might discover services that are regularly idled or simply cost too much relative to their throughput. In the case of Hydrolix, you can monitor CDN performance and transaction logs from each of your services to determine if they are providing enough value for both your sustainability efforts and your bottom line. Because Hydrolix offers long-term data retention, you can also analyze trends in usage over time.

Use object storage for real-time data.

Many SaaS enterprises rely on real-time data to drive their platforms. This includes SaaS ad tech, observability, and AI-driven applications, just to name a few. Real-time data drives both a platform’s internal operations—such as monitoring application performance—and its external value—providing actionable data to customers as quickly as possible.

To drive real-time data, SaaS platforms often rely on energy-intensive hardware such as solid state drives (SSDs) and compute-intensive software like in-memory databases. The goal of these enhancements is to make data “hot,” or readily available for querying and analysis. However, they are extremely expensive in addition to being unsustainable at scale.

To solve this problem, SaaS solutions and the cloud data platforms that power them must shift to more energy-efficient (and affordable) object storage for their data layers. Traditionally, object storage took longer to access, but modern solutions like Hydrolix can provide near-real-time data at a fraction of the cost—and much less energy—than legacy data layers.

The transition from a legacy data layer to cost-effective, comparatively sustainable object storage can be challenging for large enterprises, but cloud data platforms like Hydrolix can help with this transition—and efficiently power data at scale for SaaS platforms.

Scale components independently with decoupled, stateless architecture to reduce overprovisioning.

To ensure that each part of your system uses only the resources it needs, it’s important to architect your applications and services to be independently scalable. When it comes to real-time data, that means decoupling ingest, storage, and query. For instance, for a peak event, you may need to scale up data ingest. With tightly coupled components, you can’t just scale up one part of the system. That means scaling up leads to idle resources in other parts of the system. These resources go to waste, and you still have to pay for them.

In addition to being able to scale up just the components you need, you should be able to scale components down—even down to zero. You might want to scale down query-related components at night and on the weekends when fewer teams are accessing data. By incorporating both horizontal and vertical scaling, you can make sure that every part of your system is rightsized and that you aren’t wasting resources.

Many legacy systems have tightly coupled architecture, so paying down technical debt and decoupling components is difficult. By integrating a cloud data platform like Hydrolix, you can scale each part of your data layer independently without having to fully rearchitect your system.

Identify opportunities to trade performance for reduced usage.

One of Vogels’ laws for the frugal architect is that architecting is a series of trade-offs. Sometimes the right tradeoff to make is reduced consumption and costs in return for less performance. While some services need real-time performance, others do not. Data that’s reserved for long-term analysis or compliance doesn’t always have to be queried immediately. AI models, which can be compute intensive, generally don’t need to complete training runs in real-time. Prioritizing performance over all else will drive up costs and resource consumption. If services are used infrequently, they should use fewer compute resources.

When it comes to considering resource usage versus performance for your data layer, you should have separate workloads for teams and services, each with their own independently scalable resources. In the case of Hydrolix, you can scale each part of the system as needed to fine-tune resource usage. One use case is to reduce the number of query peers for teams that don’t rely on real-time data. You’ll trade off some performance, but in return, you’ll consume fewer resources and have lower compute costs.

Keep resources close to the edge.

Edge computing provides cloud resources close to where they’re used. Instead of keeping the data that your customers need in a centralized location (where they might travel a great distance to reach end users), with edge computing, you can keep client resources close to end users. By reducing network traffic and the distance your data has to travel, you’ll use fewer resources. Cloud providers typically offer edge computing features.

Minimize the movement of data across regions.

Data transfer across cloud provider regions and zones is a necessary part of doing business for SaaS platforms. However, the movement of data (egress) can be costly and resource-intensive. In massively distributed environments that span the globe, it can be challenging to reduce data egress. Consider which services will benefit from having data close to the edge (for edge computing)—examples include sensor data from IoT (internet of things) devices. At the same time, some data such as internal logs shouldn’t be moved, minimizing egress.

Compress data to reduce your storage footprint.

Reducing your overall storage footprint has a direct impact on resource consumption. This is especially important because the volume of incoming data is increasing dramatically—and if your company is growing, then your customer data is also growing considerably as well. There are a number of ways you can reduce your storage footprint, with discarding data (through sampling, limited retention, or other methods) being the least ideal. Instead, compression formats such as Parquet can reduce the size of your data by 90% or more. It’s not just a smaller storage footprint—you’ll also need to transfer less data as well. Hydrolix uses a proprietary high-density compression format to achieve up to 50x compression without compromising query performance.

Understand your cloud provider’s sustainability goals.

Cloud providers such as Google Cloud Platform, AWS, and Azure all have ambitious goals, starting with 100% renewable energy and then moving towards complete decarbonization. But there are variations in their goals and how close each cloud provider is to achieving them. For example, AWS plans to have 100% renewable energy by 2025. Azure has been carbon neutral since 2012 and plans to be carbon negative by 2030—with the goal of removing all of the carbon that Microsoft has ever emitted since its founding by 2050. Google Cloud Platform plans to be net-zero carbon by 2030. 

You can even get information on how energy-intensive each regional data center is through https://www.climatiq.io/data/explorer. While this doesn’t provide information about the energy source or how efficient it is (because it doesn’t include information about how much data passed through each center), you can get a sense of the overall carbon footprint of the centers your applications are using.

Reduce your impact with carbon offsets.

Your first priority should be to reduce your company’s energy consumption. But you can reduce your overall carbon footprint (to net-neutral or even negative) with carbon offsets. If you purchase carbon offsets, make sure the source is certified (one example of a certifier is Green-e Climate) and that there’s complete transparency about how the source is offsetting carbon. The method could include anything from planting trees to restoring wetlands to methane capture.

Power Your Sustainability Efforts With Hydrolix

Hydrolix is a cloud data platform optimized for transaction logs that allows you to ingest, store, and query real-time streaming data at scale. Hydrolix can power SaaS platforms in areas such as ad tech, observability, security, and media by providing a cost-effective data layer that reduces both cost and resource consumption for large volumes of data. You get the following benefits to help drive your sustainability efforts:

  • Always available “hot” storage uses inexpensive, efficient object storage instead of resource-intensive SSDs or in-memory databases.
  • Scale all components independently to use only what you need and prevent overprovisioning and wasted resources.
  • Up to 50x compression dramatically reduces the storage footprint of your data.
  • Store your data in a virtual private cloud (VPC) to maximize cloud provider benefits, including increased efficiency.
  • Monitor both internal and external log data to better understand throughput and resource usage for your services.

With Hydrolix, cost really is a proxy for sustainability—you can lower your costs and your carbon footprint even as you scale your platform.

Next Steps

Learn more about Hydrolix and contact us for a POC.

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

Sustainability isn’t just a best practice, it’s a necessary practice. Setting aside the scientific necessity of remaining below critical climate thresholds, sustainability is now being built into regulations such as the European Corporate Sustainability Due Diligence Directive (CSDDD), which requires large enterprises to have a climate transition plan and mitigate their environmental impact. And most large enterprises now have ESG (environmental, sustainability, and governance) directives to help guide them—and prove to customers that they’re doing their part.

If you’re building a SaaS platform with scalability in mind, how do you work towards sustainability when the goal is to keep growing your customer base, and with it, the amount of data in your systems? And how can you mitigate the environmental impact of your applications if you’re relying on cloud providers to power your applications? In this post, you’ll learn about steps to reduce the carbon footprint of your company, including building efficiency into your data layer.

Cloud Computing Is Energy Intensive

When it comes to cloud computing, sustainability can be challenging to visualize. Cloud compute and storage uses hardware and servers that are often thousands of miles away. But cloud computing consumes a vast amount of energy and creates more carbon than the airline industry. That energy comes from the hardware used to compute, transfer, and store information in the cloud as well as the air conditioning needed to cool that hardware. It includes redundancy in the form of backup servers and energy sources to ensure that your data is always online. On top of that, enterprises and consumers are ingesting and creating more data than ever before.

While cloud providers are transitioning to renewable energy sources and using innovative approaches to make efficient chips and cooling techniques that use less energy, when you’re building on top of cloud infrastructure, you can’t directly tie your usage to a specific renewable resource. The cloud’s ubiquitousness—its ability to draw always-available resources from just about anywhere—is both what makes it so powerful but also difficult to track precisely. As a result, it’s easy to forget just how resource-intensive cloud computing is—and it can be hard to identify and fix inefficiencies in your system.

Lowering Your Carbon Footprint in the Cloud

Most enterprises use cloud infrastructure for scalability, and you can’t choose which energy sources your applications use (at least not yet). Unless you build a data center on top of renewable resources like solar or geothermal, cloud infrastructure is a big improvement over on-prem because cloud providers use efficiencies of scale to drive innovation in cooling, hardware, and more. On the flip side, you don’t have control over the hardware. That means the main lever you have to pull is efficiency, and the goal is to use as few resources as possible. The good news is that using fewer resources is also good for the bottom line.

Cost is a loose proxy for sustainability.

In his 2023 AWS re:Invent keynote on the frugal architect, Dr. Werner Vogels, CTO of Amazon, said that sustainability should be a priority for all companies—and that cost is a proxy for sustainability. Cloud provider costs are usage-based, so reducing costs is synonymous with reducing usage—which means that efforts to improve the bottom line can also boost sustainability. This is in contrast to the misperception that sustainability is both difficult and expensive to implement. Tie sustainability to your bottom line, because reducing cloud spend is good for both the planet and your budget.

Monitor your resource usage.

In order to understand how many resources you’re using, you need to measure it in the first place. It’s similar to having a visible thermostat in your house, an analogy that Dr. Vogels made in his re:Invent keynote. If the thermostat is visible, you’ll be aware of your energy consumption and use less energy. Conversely, if it’s not visible, you’ll use more energy—and in the example that Dr. Vogels used (homes in Amsterdam) the difference in energy consumption between homes that have visible thermostats versus those that have thermostats in the basement is dramatic.

In distributed systems that use many services, often across multiple hosts and cloud providers, it can ber challenging to get a holistic view of the resources your system is using. Utilize cloud provider dashboards as well as observability and log monitoring solutions to get a complete picture of resource usage throughout your system. By doing so, you can identify services that are using a disproportionate amount of resources and target inefficiencies.

As an example, you could compare overall requests and throughput against the cost of a service. You might discover services that are regularly idled or simply cost too much relative to their throughput. In the case of Hydrolix, you can monitor CDN performance and transaction logs from each of your services to determine if they are providing enough value for both your sustainability efforts and your bottom line. Because Hydrolix offers long-term data retention, you can also analyze trends in usage over time.

Use object storage for real-time data.

Many SaaS enterprises rely on real-time data to drive their platforms. This includes SaaS ad tech, observability, and AI-driven applications, just to name a few. Real-time data drives both a platform’s internal operations—such as monitoring application performance—and its external value—providing actionable data to customers as quickly as possible.

To drive real-time data, SaaS platforms often rely on energy-intensive hardware such as solid state drives (SSDs) and compute-intensive software like in-memory databases. The goal of these enhancements is to make data “hot,” or readily available for querying and analysis. However, they are extremely expensive in addition to being unsustainable at scale.

To solve this problem, SaaS solutions and the cloud data platforms that power them must shift to more energy-efficient (and affordable) object storage for their data layers. Traditionally, object storage took longer to access, but modern solutions like Hydrolix can provide near-real-time data at a fraction of the cost—and much less energy—than legacy data layers.

The transition from a legacy data layer to cost-effective, comparatively sustainable object storage can be challenging for large enterprises, but cloud data platforms like Hydrolix can help with this transition—and efficiently power data at scale for SaaS platforms.

Scale components independently with decoupled, stateless architecture to reduce overprovisioning.

To ensure that each part of your system uses only the resources it needs, it’s important to architect your applications and services to be independently scalable. When it comes to real-time data, that means decoupling ingest, storage, and query. For instance, for a peak event, you may need to scale up data ingest. With tightly coupled components, you can’t just scale up one part of the system. That means scaling up leads to idle resources in other parts of the system. These resources go to waste, and you still have to pay for them.

In addition to being able to scale up just the components you need, you should be able to scale components down—even down to zero. You might want to scale down query-related components at night and on the weekends when fewer teams are accessing data. By incorporating both horizontal and vertical scaling, you can make sure that every part of your system is rightsized and that you aren’t wasting resources.

Many legacy systems have tightly coupled architecture, so paying down technical debt and decoupling components is difficult. By integrating a cloud data platform like Hydrolix, you can scale each part of your data layer independently without having to fully rearchitect your system.

Identify opportunities to trade performance for reduced usage.

One of Vogels’ laws for the frugal architect is that architecting is a series of trade-offs. Sometimes the right tradeoff to make is reduced consumption and costs in return for less performance. While some services need real-time performance, others do not. Data that’s reserved for long-term analysis or compliance doesn’t always have to be queried immediately. AI models, which can be compute intensive, generally don’t need to complete training runs in real-time. Prioritizing performance over all else will drive up costs and resource consumption. If services are used infrequently, they should use fewer compute resources.

When it comes to considering resource usage versus performance for your data layer, you should have separate workloads for teams and services, each with their own independently scalable resources. In the case of Hydrolix, you can scale each part of the system as needed to fine-tune resource usage. One use case is to reduce the number of query peers for teams that don’t rely on real-time data. You’ll trade off some performance, but in return, you’ll consume fewer resources and have lower compute costs.

Keep resources close to the edge.

Edge computing provides cloud resources close to where they’re used. Instead of keeping the data that your customers need in a centralized location (where they might travel a great distance to reach end users), with edge computing, you can keep client resources close to end users. By reducing network traffic and the distance your data has to travel, you’ll use fewer resources. Cloud providers typically offer edge computing features.

Minimize the movement of data across regions.

Data transfer across cloud provider regions and zones is a necessary part of doing business for SaaS platforms. However, the movement of data (egress) can be costly and resource-intensive. In massively distributed environments that span the globe, it can be challenging to reduce data egress. Consider which services will benefit from having data close to the edge (for edge computing)—examples include sensor data from IoT (internet of things) devices. At the same time, some data such as internal logs shouldn’t be moved, minimizing egress.

Compress data to reduce your storage footprint.

Reducing your overall storage footprint has a direct impact on resource consumption. This is especially important because the volume of incoming data is increasing dramatically—and if your company is growing, then your customer data is also growing considerably as well. There are a number of ways you can reduce your storage footprint, with discarding data (through sampling, limited retention, or other methods) being the least ideal. Instead, compression formats such as Parquet can reduce the size of your data by 90% or more. It’s not just a smaller storage footprint—you’ll also need to transfer less data as well. Hydrolix uses a proprietary high-density compression format to achieve up to 50x compression without compromising query performance.

Understand your cloud provider’s sustainability goals.

Cloud providers such as Google Cloud Platform, AWS, and Azure all have ambitious goals, starting with 100% renewable energy and then moving towards complete decarbonization. But there are variations in their goals and how close each cloud provider is to achieving them. For example, AWS plans to have 100% renewable energy by 2025. Azure has been carbon neutral since 2012 and plans to be carbon negative by 2030—with the goal of removing all of the carbon that Microsoft has ever emitted since its founding by 2050. Google Cloud Platform plans to be net-zero carbon by 2030. 

You can even get information on how energy-intensive each regional data center is through https://www.climatiq.io/data/explorer. While this doesn’t provide information about the energy source or how efficient it is (because it doesn’t include information about how much data passed through each center), you can get a sense of the overall carbon footprint of the centers your applications are using.

Reduce your impact with carbon offsets.

Your first priority should be to reduce your company’s energy consumption. But you can reduce your overall carbon footprint (to net-neutral or even negative) with carbon offsets. If you purchase carbon offsets, make sure the source is certified (one example of a certifier is Green-e Climate) and that there’s complete transparency about how the source is offsetting carbon. The method could include anything from planting trees to restoring wetlands to methane capture.

Power Your Sustainability Efforts With Hydrolix

Hydrolix is a cloud data platform optimized for transaction logs that allows you to ingest, store, and query real-time streaming data at scale. Hydrolix can power SaaS platforms in areas such as ad tech, observability, security, and media by providing a cost-effective data layer that reduces both cost and resource consumption for large volumes of data. You get the following benefits to help drive your sustainability efforts:

  • Always available “hot” storage uses inexpensive, efficient object storage instead of resource-intensive SSDs or in-memory databases.
  • Scale all components independently to use only what you need and prevent overprovisioning and wasted resources.
  • Up to 50x compression dramatically reduces the storage footprint of your data.
  • Store your data in a virtual private cloud (VPC) to maximize cloud provider benefits, including increased efficiency.
  • Monitor both internal and external log data to better understand throughput and resource usage for your services.

With Hydrolix, cost really is a proxy for sustainability—you can lower your costs and your carbon footprint even as you scale your platform.

Next Steps

Learn more about Hydrolix and contact us for a POC.