With the increasing popularity of streaming services and on-demand content, it’s uniquely challenging for media companies to manage and optimize services for their customers. To successfully monitor these services, you need to analyze real-time data on a terabyte scale. Many observability platforms aren’t able to handle this level of scale, and even when they are, the cost of ingesting big data is often astronomical.
Hydrolix is built for ingesting, storing, and querying data at scale, making it an excellent choice for analyzing data related to streaming and on-demand content. In this post, you’ll learn how Hydrolix helps Paramount leverage a multi-CDN strategy to provide high quality video delivery to their end users. In order to do so, Paramount monitors CDN, origin, and player analytics data sources in real time, then switches CDNs as needed so end users always have an optimized experience. By using Hydrolix, the Paramount PoC team estimated that they reduced their infrastructure spend by over 50% while increasing their data retention period 6x.
This post covers how Paramount uses Hydrolix to monitor multiple CDNs. You’ll also learn:
- How Hydrolix tables can ingest data from multiple sources, allowing you to see all your CDN data in one place.
- How to standardize naming across multiple data sources, a common issue when monitoring data sources that use different naming conventions.
- How to add custom enrichment to your data. For example, you can split URL parameters into substrings and map the substrings to columns in your table, use custom functions, and use an in-memory dictionary to look up derived values for your data.
Ingesting Multiple Sources in a Single Table
One of the biggest technical challenges of monitoring multiple CDNs is that each vendor delivers logs in a proprietary format using proprietary connectors. In order to compare and understand data from different CDNs, Paramount needed a way to transform all of these diverse data sources into a single, normalized table.
Hydrolix multi-schema tables allow for multiple sources and transforms. Transforms provide instructions on how data should be mapped to a Hydrolix table, giving you flexibility to standardize and enrich your data. With multi-schema tables, you can integrate data from multiple sources and formats into a single, unified view. For example, you can have a single table where one dataset is delivered via HTTP streaming and another via Kafka, and because Hydrolix tables support unlimited dimensionality, each source can have different column names.
Paramount deployed a dedicated Hydrolix for GKE cluster in their GCP account, created a table for normalized CDN logs, published separate transforms for each CDN vendor, and then configured each of their vendors to send data to their Hydrolix cluster. These transforms convert various raw data feeds into a unified Paramount standard. Paramount also leverages advanced functionality in Hydrolix such as dictionary lookup to enrich incoming stream data with business labels and metrics.

For more details on real-time logs delivery for each CDN, see:
Standardizing Log Data
Naming conventions often vary across data sources, including CDNs. For example, each CDN has a different name for HTTP status code data as shown in the next table.
CDN | Akamai | Edgecast | Fastly | Standard |
Field Name | statusCode | status_code | status | status_code |
Fortunately, it’s simple to configure a Hydrolix transform file to standardize naming conventions with from_input_field. If you aren’t familiar with Hydrolix transforms yet, learn how to write your first transform.
1 2 3 4 5 6 7 8 9 | { “name”: “status_code”, “datatype”: { “type”: “uint32”, “source”: { “from_input_field”: “statusCode” } } } |
This configuration takes incoming data from the source field statusCode and maps it to the column status_code
in the Hydrolix table.
You can apply different transforms for each CDN. For Fastly, this would be the following transform:
1 2 3 4 5 6 7 8 9 | { “name”: “status_code”, “datatype”: { “type”: “uint32”, “source”: { “from_input_field”: “status” } } } |
Adding Context With Metadata
When ingesting logs from different sources, you often need to add context. One example is including metadata about the source. For example, a Fastly log doesn’t include a source: "fastly"
field. To help identify each CDN in the Hydrolix destination table, Paramount created a new virtual column source containing the CDN name:
1 2 3 4 5 6 7 8 | { “name”: “source”, “datatype”: { “type”: “string”, “default”: “edgecast”, “virtual”: true } } |
When you need to enrich a source with additional data, you can use "virtual": true
to create a column that isn’t derived from the ingest source. In this example, the value of the column field is a constant because the transform corresponding to data ingest from Edgecast will always have "edgecast"
as the source. Virtual fields aren’t always constants, though. They can be derived from scripts or SQL transforms as well.
If you’re just getting started with creating transforms for CDN sources, see working examples of CDN transform schemas.
Using SQL Transforms for Custom Enrichment
In addition to renaming and remapping columns, Hydrolix allows you to make more advanced modifications by writing SQL statements for incoming data. For example, you can do dictionary lookups, call custom functions, and perform computations.
Let’s take a look at an example of a sql_transform
that Paramount applied to incoming data from an Akamai CDN. This example shows how you can enrich data in the following ways:
- Split a string into substrings and then map those substrings to a table. You can extract parameters from a URL and map them to the table as separate columns.
- Use a custom function which is scoped to a Hydrolix project. Each table in Hydrolix belongs to a project. You can easily add your own custom functions through the Hydrolix UI and API and then call those functions as needed in your transforms.
- Use an in-memory dictionary to further enrich your data. Perform dictionary lookups and map the lookup results to fields in your table.
- Compute and map values to your table columns. For instance, you might want to sum the value of two fields or add a timestamp for when the data was received.
Here is the sql_transform
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SELECT concat(‘cpc_’, account_number) as account_number, concat(‘/’, path) as path, splitByChar(‘,’,decodeURLComponent(assumeNotNull(customField))) as custom_array, custom_array[1] as stream_id, custom_array[2] as pop, custom_array[3] as client_asn, (request_end_time_msec + transfer_time_msec) as server_ttlb, visordb_visor_path_prefix(assumeNotNull(path),assumeNotNull(host)) as path_prefix, decodeURLComponent(assumeNotNull(user_agent)) as user_agent, dictGet(‘visor_akamai_tags’, (‘stream’, ‘stream_config_id’, ‘environment’, ‘workflow’, ‘business_unit’),(account_number, path_prefix)) as meta, tupleElement(meta, ‘stream’) as stream, tupleElement(meta, ‘stream_config_id’) as stream_config_id, tupleElement(meta, ‘environment’) as environment, tupleElement(meta, ‘workflow’) as workflow, tupleElement(meta, ‘business_unit’) as business_unit, NOT cached as midgress, now() as auto_receive_time, * FROM {STREAM} |
Let’s break this code down further. In this example, Hydrolix uses the Clickhouse SQL Parser, which exposes functions like decodeURLComponent()
. This function decodes the URL from a customField
included in Akamai CDN logs. The decoded URL is split into an array of substrings, which are then mapped to the fields stream_id
, pop
, and client_asn
in the Hydrolix table as shown in the next code snippet.
1 2 3 4 | splitByChar(‘,’,decodeURLComponent(assumeNotNull(customField))) as custom_array, custom_array[1] as stream_id, custom_array[2] as pop, custom_array[3] as client_asn |
With this code, a "customField"
containing the following value "XXXXXXX,38458,15169"
would be split into three columns in the Hydrolix table:
stream_id: XXXXXXX
pop: 38458
client_asn: 15169
Now these columns can be queried and analyzed individually. You can use a similar approach whenever you need separate columns for parameters from URLs or in other use cases where you need to extract a substring from a larger string and store the data in its own column.
If you need to perform computations on your data, you can do so in the sql_transform
as the following line in the example shows:
1 | (request_end_time_msec + transfer_time_msec) as server_ttlb, |
In this case, a simple calculation provides the value of server_ttlb
as the sum of (request_end_time_msec + transfer_time_msec)
.
Next, the sql_transform
uses a custom function that extracts specific information from the path of the incoming data.
1 2 3 4 5 6 7 | visordb_visor_path_prefix(assumeNotNull(path),assumeNotNull(host)) as path_prefix, dictGet(‘visor_akamai_tags’, (‘stream’, ‘stream_config_id’, ‘environment’, ‘workflow’, ‘business_unit’),(account_number, path_prefix)) as meta, tupleElement(meta, ‘stream’) as stream, tupleElement(meta, ‘stream_config_id’) as stream_config_id, tupleElement(meta, ‘environment’) as environment, tupleElement(meta, ‘workflow’) as workflow, tupleElement(meta, ‘business_unit’) as business_unit, |
The specifics of the custom visordb_visor_path_prefix
function aren’t included here, but you can create your own functions in the Hydrolix UI or through the API, and these functions are scoped to Hydrolix projects.
Next, the sql_transform
performs a real-time lookup with dictGet()
, a Clickhouse function, to pull values from an in-memory dictionary file and further enrich incoming data. This functionality improves indexing and reduces the need for expensive joins at query time.
As the example shows, each key from the dictionary returns a tuple, and the required data is extracted and mapped to columns in the Hydrolix table.
Finally, the sql_transform
computes the auto_receive_time
with the time function now()
.
These are just a few examples of how you can use a sql_transform
to process and enrich incoming data.
Summary
In this post, you learned how Paramount uses Hydrolix to monitor multiple CDNs. You also learned how to use Hydrolix transforms to standardize, enrich, and process your data.
Next Steps
If you’re working with big data and don’t have a Hydrolix account yet, sign up for a free trial today. With Hydrolix, you get industry-leading data compression, lightning-fast queries, and unmatched scalability for your observability data—all at a fraction of the price of what observability platforms typically cost. Learn more about Hydrolix.