RSS

Hydrolix Delivers Big Game Observability for Major Broadcast and Streaming Content Provider

Learn how Hydrolix provides observability to our clients during peak streaming events.

Franz Knupfer

Published:

Feb 28, 2024

7 minute read
,

With a record-setting 123.7 million viewers tuning in, the big game between San Francisco and Kansas City wasn’t just a game. It was a milestone in over-the-top live sports broadcasting delivered across the platform of one of the world’s largest broadcast, cable, and streaming providers.

Behind the scenes, Hydrolix provided the customer with end-to-end observability and deliverability intelligence, creating a complete view from stream origin to end user. Our team worked directly with the client’s operations team, including on-site, to navigate the immense data challenges of the event.

Performance Highlights:

  • Peak ingestion rate of 10.8 million rows/sec, with 53 billion records collected and 41 TiB of data transformed into 5.76 TiB of compressed data stored. (Hydrolix’s merge service will continue to compress the data.)
  • Across all partners, we saw peaks of 20 million rows/sec, and managed over 100 billion log lines.

The next image shows a dashboard of Hydrolix’s ingest service during the event. In this snapshot, Hydrolix was ingesting 9.73 million rows per second with 3 seconds of ingest latency. Data is normalized and transformed as it’s ingested and then stored in cost-effective cloud-based object storage.

Dashboard shows Hydrolix ingestion statistics

Because Hydrolix is stateless, decoupled, and uses cloud-native Kubernetes, every part of the system can be scaled independently. For this use case, stream heads and stream peers were scaled up to provide massive parallelism.

“This is the sort of day Hydrolix was built for. Not just this customer but multiple partners faced massive peaks in traffic, requiring real-time operational decisions based on colossal data inflows. We’re proud to say Hydrolix delivered. Achieving this scale and performance, as cost-effectively as we did, sets us apart in the industry.”

Marty Kagan, Co-founder of Hydrolix

Let’s take a look at some of the streaming media use cases that our customers and partners used for this event.

Monitoring Over-the-Top Video at Scale

As end users move away from traditional cable, over-the-top streaming video content has become increasingly popular. However, providing a high quality over-the-top viewing experience to tens of millions of users comes with a unique set of challenges. In addition to being spread across the globe, end users stream events through a wide range of devices, content delivery networks (CDNs), and internet service providers (ISPs). And to make matters worse, large scale events can be targets for both malicious activity and streaming piracy.

To provide a quality experience to as many viewers as possible, providers need to monitor a wide range of logs and metrics, including CDN performance, Common Media Client Data (CMCD), IP activity, edge data, and device data.

CDN Performance

To serve an event to millions of users around the globe, providers rely on content delivery networks to ensure performant streaming. At any given moment, a CDN may experience high loads and inconsistent performance, especially for very large events. Major providers typically use multiple CDNs, and it’s not uncommon to shift traffic from one CDN to another to improve the end user experience. Here are just a few important CDN metrics to monitor:

  • CDN bandwidth
  • 4xx error codes
  • 5xx error codes
  • Error rate % per CDN
  • Success rate % per CDN

To effectively compare CDN performance, you can ingest multiple data sources into one Hydrolix table. This is a common use case for our customers and partners. In addition to making comparisons between CDNs easier, you’ll avoid complex, compute-intensive JOINs across multiple tables. Hydrolix tables are extremely flexible. Each ingest source can have different columns. And you can use Hydrolix transforms to normalize and standardize incoming data on the fly to ensure your comparisons are consistent.

To learn about the value of ingesting multiple sources data, watch this TF1 customer story.

Common Media Client Data

Common Media Client Data (CMCD) is an open specification for in-depth playback information from an end user’s media player. It includes a wealth of information that you can’t collect from server-side or CDN monitoring. While many widely-used media players still don’t include CMCD support, major streaming providers typically build CMCD support into their media players to collect this valuable data. Here are just a few examples of CMCD that you can monitor with Hydrolix:

  • Average buffer size
  • Buffer starvation count
  • Rebuffer sessions
  • Bitrate
  • CMCD encoded bitrate per CDN

Buffer starvation and rebuffering lead to frustrating viewer experiences, and being able to correlate this data to specific CDNs, devices, and ISPs can be helpful for troubleshooting and optimizing performance. The last thing any viewer wants is to miss a single play in overtime.

CMCD can provide a huge amount of metadata. If you are collecting data from tens of millions of viewers, you need high performance data ingest (and support for late arriving data) to capture the data in a timely manner. You also need a cost-effective storage solution and high density compression to ensure that the costs of collecting and keeping all that data don’t spiral out of control. Hydrolix utilizes massive parallelism and object storage for terabyte scale ingest and long-term, cost-effective data that is always “hot” for query analysis. And the Hydrolix is designed to handle late-arriving and out-of-order data.

Device Data

There are many pieces that must fit together for performant over-the-top video, but only one thing needs to go wrong for end users to get frustrated. What happens if an end user has poor buffering performance, but all the incoming data suggests that everything is working as intended in terms of content delivery? The end user might have an old device with an outdated operating system—but don’t worry, you’ll still get blamed for providing a poor streaming experience! By monitoring user devices, you get valuable information about everything from screen size to which encoding the device supports, allowing you to provide an optimized experience.

Device data is also valuable for advertising video on demand (AVOD). As an example, you might want to serve Apple ads to IPhone users and Android ads to Android users.

Hydrolix offers an integration with ScientiaMobile, a real-time device detection solution that can identify more than 100,000 device profiles. Device data is high dimensionality, so you need a high-performance columnar datastore like Hydrolix to efficiently retrieve data.

IP activity

It’s already challenging to provide a great user experience to tens of millions of viewers. What happens when you also have malicious attackers trying to slow down traffic or siphon off traffic for pirated streams?

Streaming piracy is a huge problem. In addition to lost revenue and viewership, streaming websites often run disreputable ads, which can lead to damage to your brand. By monitoring IP addresses, you can determine which IPs are streaming your content. Then you can either shut down or throttle the stream. You can even use your data to determine if throttling is the most effective approach. Users frustrated with a throttled stream might be more likely to turn to an official stream than just find another pirated option. It’s a great hypothesis to test after an event, but it’s possible if you keep all your data readily available. Because Hydrolix maximizes the performance of cost-effective object storage, you can keep all your data “hot” and query it long after the event is over.

IP addresses can also be used to detect DDoS attacks. See How Elkjøp Stopped a Massive DDoS Attack to learn about this particular use case.

Edge Data

Large-scale networks typically use edge computing to process data at the edge of the network, making it more performant. This could occur on edge servers, IoT devices, personal devices, and more. One key metric to collect is time to last byte (TTLB). This gives you an overarching view of how long it takes to send data from the origin to the device where it’s streamed. You’ll also want to monitor 2xx, 4xx, and 5xx error codes for each edge server.

Hydrolix for Streaming Observability at Scale

Hydrolix is built for streaming large events at petabyte scale, which makes it an excellent fit for the largest sporting events in the world.

Because Hydrolix is fully decoupled and stateless, Hydrolix is resilient under heavy loads. Users can provision additional ingest and query resources as needed for major events, then scale down resources after the event. 

Decoupled object storage is much more cost-effective than tightly coupled storage solutions, so you can keep more data available for quick analysis for much longer. Storage is flexible and pay-as-you-go. You don’t need to preallocate extra storage, which saves both time and money.  This is particularly important for large events that generate massive volumes of data. It may take your teams months to sift through that data, so it should remain readily available for insights.

To prevent resource contention, you can use precision query scaling to create separate query pools for separate groups of users. Query pools are workload-specific groups of query peers that can scale independently of one another. On game day, operations teams can make sure that the query pools powering critical dashboards get extra resources. CDN performance and routing as well as anti-piracy analysis tasks can also be scaled to meet performance goals without contending with each other or ingest-focused compute resources.

Hydrolix supports data ingest at terabyte scale from many sources, and you can combine multiple sources into one table. And Hydrolix is optimized for ad hoc queries at any scale. You can get sub-second query latency even on tables of 100 billion rows. See Maximizing Query Performance for 100+ Billion Row Data Sets to learn more about how Hydrolix queries.

Perhaps most importantly, Hydrolix achieves this high performance with the lowest unit costs in the industry. There’s no need to provision storage ahead of time because Hydrolix maximizes the performance of flexible, horizontally scalable object storage. Hydrolix uses a high-density compression algorithm to achieve compression rates of 20x-50x.

Next Steps

Learn more about Hydrolix and contact us for a POC.

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

With a record-setting 123.7 million viewers tuning in, the big game between San Francisco and Kansas City wasn’t just a game. It was a milestone in over-the-top live sports broadcasting delivered across the platform of one of the world’s largest broadcast, cable, and streaming providers.

Behind the scenes, Hydrolix provided the customer with end-to-end observability and deliverability intelligence, creating a complete view from stream origin to end user. Our team worked directly with the client’s operations team, including on-site, to navigate the immense data challenges of the event.

Performance Highlights:

  • Peak ingestion rate of 10.8 million rows/sec, with 53 billion records collected and 41 TiB of data transformed into 5.76 TiB of compressed data stored. (Hydrolix’s merge service will continue to compress the data.)
  • Across all partners, we saw peaks of 20 million rows/sec, and managed over 100 billion log lines.

The next image shows a dashboard of Hydrolix’s ingest service during the event. In this snapshot, Hydrolix was ingesting 9.73 million rows per second with 3 seconds of ingest latency. Data is normalized and transformed as it’s ingested and then stored in cost-effective cloud-based object storage.

Dashboard shows Hydrolix ingestion statistics

Because Hydrolix is stateless, decoupled, and uses cloud-native Kubernetes, every part of the system can be scaled independently. For this use case, stream heads and stream peers were scaled up to provide massive parallelism.

“This is the sort of day Hydrolix was built for. Not just this customer but multiple partners faced massive peaks in traffic, requiring real-time operational decisions based on colossal data inflows. We’re proud to say Hydrolix delivered. Achieving this scale and performance, as cost-effectively as we did, sets us apart in the industry.”

Marty Kagan, Co-founder of Hydrolix

Let’s take a look at some of the streaming media use cases that our customers and partners used for this event.

Monitoring Over-the-Top Video at Scale

As end users move away from traditional cable, over-the-top streaming video content has become increasingly popular. However, providing a high quality over-the-top viewing experience to tens of millions of users comes with a unique set of challenges. In addition to being spread across the globe, end users stream events through a wide range of devices, content delivery networks (CDNs), and internet service providers (ISPs). And to make matters worse, large scale events can be targets for both malicious activity and streaming piracy.

To provide a quality experience to as many viewers as possible, providers need to monitor a wide range of logs and metrics, including CDN performance, Common Media Client Data (CMCD), IP activity, edge data, and device data.

CDN Performance

To serve an event to millions of users around the globe, providers rely on content delivery networks to ensure performant streaming. At any given moment, a CDN may experience high loads and inconsistent performance, especially for very large events. Major providers typically use multiple CDNs, and it’s not uncommon to shift traffic from one CDN to another to improve the end user experience. Here are just a few important CDN metrics to monitor:

  • CDN bandwidth
  • 4xx error codes
  • 5xx error codes
  • Error rate % per CDN
  • Success rate % per CDN

To effectively compare CDN performance, you can ingest multiple data sources into one Hydrolix table. This is a common use case for our customers and partners. In addition to making comparisons between CDNs easier, you’ll avoid complex, compute-intensive JOINs across multiple tables. Hydrolix tables are extremely flexible. Each ingest source can have different columns. And you can use Hydrolix transforms to normalize and standardize incoming data on the fly to ensure your comparisons are consistent.

To learn about the value of ingesting multiple sources data, watch this TF1 customer story.

Common Media Client Data

Common Media Client Data (CMCD) is an open specification for in-depth playback information from an end user’s media player. It includes a wealth of information that you can’t collect from server-side or CDN monitoring. While many widely-used media players still don’t include CMCD support, major streaming providers typically build CMCD support into their media players to collect this valuable data. Here are just a few examples of CMCD that you can monitor with Hydrolix:

  • Average buffer size
  • Buffer starvation count
  • Rebuffer sessions
  • Bitrate
  • CMCD encoded bitrate per CDN

Buffer starvation and rebuffering lead to frustrating viewer experiences, and being able to correlate this data to specific CDNs, devices, and ISPs can be helpful for troubleshooting and optimizing performance. The last thing any viewer wants is to miss a single play in overtime.

CMCD can provide a huge amount of metadata. If you are collecting data from tens of millions of viewers, you need high performance data ingest (and support for late arriving data) to capture the data in a timely manner. You also need a cost-effective storage solution and high density compression to ensure that the costs of collecting and keeping all that data don’t spiral out of control. Hydrolix utilizes massive parallelism and object storage for terabyte scale ingest and long-term, cost-effective data that is always “hot” for query analysis. And the Hydrolix is designed to handle late-arriving and out-of-order data.

Device Data

There are many pieces that must fit together for performant over-the-top video, but only one thing needs to go wrong for end users to get frustrated. What happens if an end user has poor buffering performance, but all the incoming data suggests that everything is working as intended in terms of content delivery? The end user might have an old device with an outdated operating system—but don’t worry, you’ll still get blamed for providing a poor streaming experience! By monitoring user devices, you get valuable information about everything from screen size to which encoding the device supports, allowing you to provide an optimized experience.

Device data is also valuable for advertising video on demand (AVOD). As an example, you might want to serve Apple ads to IPhone users and Android ads to Android users.

Hydrolix offers an integration with ScientiaMobile, a real-time device detection solution that can identify more than 100,000 device profiles. Device data is high dimensionality, so you need a high-performance columnar datastore like Hydrolix to efficiently retrieve data.

IP activity

It’s already challenging to provide a great user experience to tens of millions of viewers. What happens when you also have malicious attackers trying to slow down traffic or siphon off traffic for pirated streams?

Streaming piracy is a huge problem. In addition to lost revenue and viewership, streaming websites often run disreputable ads, which can lead to damage to your brand. By monitoring IP addresses, you can determine which IPs are streaming your content. Then you can either shut down or throttle the stream. You can even use your data to determine if throttling is the most effective approach. Users frustrated with a throttled stream might be more likely to turn to an official stream than just find another pirated option. It’s a great hypothesis to test after an event, but it’s possible if you keep all your data readily available. Because Hydrolix maximizes the performance of cost-effective object storage, you can keep all your data “hot” and query it long after the event is over.

IP addresses can also be used to detect DDoS attacks. See How Elkjøp Stopped a Massive DDoS Attack to learn about this particular use case.

Edge Data

Large-scale networks typically use edge computing to process data at the edge of the network, making it more performant. This could occur on edge servers, IoT devices, personal devices, and more. One key metric to collect is time to last byte (TTLB). This gives you an overarching view of how long it takes to send data from the origin to the device where it’s streamed. You’ll also want to monitor 2xx, 4xx, and 5xx error codes for each edge server.

Hydrolix for Streaming Observability at Scale

Hydrolix is built for streaming large events at petabyte scale, which makes it an excellent fit for the largest sporting events in the world.

Because Hydrolix is fully decoupled and stateless, Hydrolix is resilient under heavy loads. Users can provision additional ingest and query resources as needed for major events, then scale down resources after the event. 

Decoupled object storage is much more cost-effective than tightly coupled storage solutions, so you can keep more data available for quick analysis for much longer. Storage is flexible and pay-as-you-go. You don’t need to preallocate extra storage, which saves both time and money.  This is particularly important for large events that generate massive volumes of data. It may take your teams months to sift through that data, so it should remain readily available for insights.

To prevent resource contention, you can use precision query scaling to create separate query pools for separate groups of users. Query pools are workload-specific groups of query peers that can scale independently of one another. On game day, operations teams can make sure that the query pools powering critical dashboards get extra resources. CDN performance and routing as well as anti-piracy analysis tasks can also be scaled to meet performance goals without contending with each other or ingest-focused compute resources.

Hydrolix supports data ingest at terabyte scale from many sources, and you can combine multiple sources into one table. And Hydrolix is optimized for ad hoc queries at any scale. You can get sub-second query latency even on tables of 100 billion rows. See Maximizing Query Performance for 100+ Billion Row Data Sets to learn more about how Hydrolix queries.

Perhaps most importantly, Hydrolix achieves this high performance with the lowest unit costs in the industry. There’s no need to provision storage ahead of time because Hydrolix maximizes the performance of flexible, horizontally scalable object storage. Hydrolix uses a high-density compression algorithm to achieve compression rates of 20x-50x.

Next Steps

Learn more about Hydrolix and contact us for a POC.