HYDROLIX BLOG

Ponderings, insights and industry updates

Multi CDN monitoring

December 2, 2022

Author: David Sztykman |

Tags: , , , ,

Paramount leverages a multi-CDN strategy to stream their videos to end-users. In order to monitor end-user experience and ensure the highest quality video delivery, Paramount ingests logs from multiple CDNs, origins, and player analytics data sources. Their goal is to provide realtime visibility into a viewer’s quality of experience, enable root cause analysis of stream-impacting issues, and equip operations teams to debug aggregate data as well as individual stream sessions.

Paramount came to Hydrolix with the challenge to build a solution which could:

  • support real-time data streaming
  • automatically scale to their needs
  • allow data enrichment via custom business logic
  • minimize system complexity and operational overhead
  • and reduce overall costs even while increasing data retention and usage

In this series of blog post we’ll explore how Paramount leveraged Hydrolix to tackle all of these challenges.

Multiple Sources -> Single Table

One of the biggest technical downsides to a multi-CDN strategy is that each vendor delivers logs in a proprietary format using proprietary connectors. In order to make sense of all this data Paramount needed a way to transform all of these diverse data feeds into a single, normalized table.

Hydrolix multi-schema tables allow for multiple sources and transforms. This flexibility makes it very easy to integrate data from multiple sources and format into a single, unified view. For example, one can have a single table where one data-set is delivered via HTTP streaming and another via Kafka, each with different columns names.

Paramount deployed a dedicated Hydrolix for GKE cluster in their GCP account, created a table for normalized CDN logs, published separate transforms for each CDN vendor, and then configured each of their vendors to send data to the Paramount Hydrolix cluster. These transforms convert various raw data-feeds into a unified Paramount standard.

Paramount also leverages advanced function such as dictionary lookup to enrich incoming stream data with business labels and metrics.

To see more details on real time logs delivery for each CDN you can have a look here:

Standardization

Each CDN can express the same data with multiple column name.
For example http status code have different name for every CDN, and the standard define by Paramount.

CDNAkamaiEdgecastFastlyStandard
Field NamestatusCodestatus_codestatusstatus_code
CDN format

Mapping Detail

Visor FieldDefinitionCloudfrontEdgecastFastlyAkamaiGoogle
account_numberAccount NumberDerived from the first sub path of rewritten_pathCPCode
backend_ipIP of the “backend”, either an origin or upstream CDN tierberesp.backend.ip
backend_ttlbTime To Last Byte in milliseconds of the backend origin or upstream CDN. The time taken from edge server forwarding request to backend machine to the time the edge server recieved and cached the response.read_time * 1000Custom var.backend_ttlb variable based on time.elapsed.msec from vcl_fetch to vcl_log divided by 1000
business_unitCBS Business Unit which owns the workflowVisor MetadataVisor MetadataVisor MetadataVisor MetadataVisor Metadata
bytes_inBytes sent from client to the CDN in the requestcs-bytesbytes_inreq.bytes_readn/ahttpRequest.requestSize
bytes_outBytes sent from CDN to the client in the responsesc-bytesbytes_outresp.bytes_writtenbyteshttpRequest.responseSize
cache_shieldfastly.ff.visits_this_service > 0
cache_statusSimplification of the X-Cache header for comparison with other CDNs. True indicates the word ‘HIT’ was found in the X-Cache header and indicates some kind of origin offload. (Ex: if the cache expired but the origin returned a 304, the origin didn’t have send back the full body)cache_statusfastly_info.statecacheStatus
cachedSimplification of cache_status. Boolen summarizing cached as true or falsederived from x-edge-response-result-typeDerived from cache_status.
TCP_HIT or TCP_PARTIAL_HIT makes this true. Otherwise false.
Derived from fastly_info.state if it contains “HIT”derived from cache_statusderived from jsonPayload.cacheStatus
client_asnThe autonomous system number sent from the client requestclient_asnclient.as.numberNow added as custom field
client_geo_countryThe 2 letter country code from where the client sent the requestc-countryclient_geo_countryclient.geo_country_codecountryjsonPayload.clientRegionCode
client_ipClient IP address from where the request was sentc-ipclient_ipreq.http.Fastly-Client-IPclient_ip (cliIP)httpRequest.remoteIP
hostThe host name the client requestedcs-hosthostreq.http.hostRequest Host (reqHost)httpRequest.requestUrl
client_ttfbThe time to first bytetime-to-first-byten/aclient_ttfbTurnaround timeedgecache.googleapis.com/edge_cache_route_rule/http_ttfb_by_client
environmentA field to filter out dev, qa, prod dataVisor MetadataVisor MetadataVisor MetadataVisor MetadataVisor Metadata
extensionThe file extension of the file requestedDerived from pathreq.url.ext
forward_forThe clients original ip address that was forwarded within the requestx-forwarded-forx-forwarded-for (custom field from header)forward_forX-Forwarded-For (xForwardedFor)httpRequest.remoteIP
failover_statusShield header to specify if a request retried in another regionresp.status
if_modified_sinceThe value of the request header if-modified-sincereq.http.If-Modified-Since
if_none_matchThe value of the if-none-match headerreq.http.If-None-Match
if_unmodified_sinceThe value of the if-unmodified-since headerreq.http.If-Unmodified-Since
midgressThe intra-CDN tier the content was served frommidgressderived from cache_status
methodhttp request methodcs-methodmethodmethodRequest Method (reqMethod)httpRequest.requestMethod
pathrequest pathcs-uri-stempathreq.url.pathRequest Path (reqPath)httpRequest.requestUrl
popThe CDN POP (airport/city code) , should be prefaced with cityx-edge-locationpopserver.datacenterNow added as custom fieldjsonPayload.metroIataCode
queryThe query string parameter within the path of a requestcs-uri-queryqueryreq.url.qsquery string (queryStr)httpRequest.requestUrl
origin_request_idA hash generated request identifier returned from the origin.custom field, derived from x-req-id header returned from origin
range_request“Range” request header used by a client to make a byte-range request for a portion of an assetreq.http.Rangerange?
range_responseValue of the Range response header after client made a byte range requestsc-range-start + sc-range-endresp.http.Range
refererThe address of where the request was made fromcs-refererrefererreq.http.RefererrefererhttpRequest.referer
request_idA hashed id to trace a request from edge, shield, and originrequest_idrequest_idrequest_idrequest_id (reqId)jsonPayload.requestId
roleUsed to help determine the edge, shield, or shield regionsVisor MetadataVisor MetadataVisor MetadataVisor MetadataVisor Metadata
retransThe number of packets in the current connection that contained data being retransmitted counted across the lifetime of the connection.client.socket.tcpi_total_retrans
rtt_msecsTCP roundtrip time in millisecondsclient.socket.tcpi_rtt / 1000
server_ipIP of the edge node that serves the content from the CDNserver_ipserver.ipEdge IP — ghost_ip
server_ttlbTime to Last Byte in milliseconds from the server. This is the total time Fastly spent processing the request including any backend request. The server’s TTLB should capture the latency of processing and serving the request from the CDN perspective.time_takentotal_timetime.elapsed.usec / 1000(request end time – request time) + transfer timeedgecache.googleapis.com/edge_cache_route_rule/http_ttlb_by_client
status_codeThe HTTP status code returnedsc-statusstatusresp.statusHTTP Status CodeshttpRequest.status
streamA friendly name used to identify a live streamvisor metadatavisor metadatavisor metadatavisor metadatavisor metadata
timestampUTC time in 100s of nanoseconds for when the server first received the request. If clocks were synchronized, should be later than or equal to when the player sent the request.timestamptimestamptimestamprequest_timejsonPayload.receiveTimestamp
user_agentIdentifier of the browser or device that made the requestcs-user-agentuser_agentreq.http.User-Agentuser-agenthttpRequest.userAgent
workflowAn organizational name that groups a live or vod video workflowVisor MetadataVisor MetadataVisor MetadataVisor MetadataVisor Metadata
session_idCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query string
buffer lengthCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query string
buffer starvationCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query string
Measured ThroughputCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query string
Content IDCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query stringCMCD field, derived from query string
Details field mapping

In order get the proper name, we can use the function from_input_field in transform:

This means that we are creating a column status_code by using the source field statusCode.

We use the same principle for each CDN and different columns.

For Fastly this would be the following transform:

To help identify each CDN in the destination table Paramount also creates a new virtual column source containing the CDN name:

We have added a list of transform in our github repository to provide working examples of each of these CDNs, and will be adding more shortly:

https://github.com/hydrolix/transforms

Custom Enrichment

CDN logs contains important information, adding external information allows Paramount to see not only bytes delivered but more details about streaming session ID. By extracting stream id from path and query parameters and correlate those with an in memory dictionary.

In addition to renaming and remapping column, Hydrolix allows users to write SQL statement for the streaming data coming into the platform. This allows users to do some advanced modification and apply additional business notations to the indexed information.

For example, in the Akamai transform Paramount applies the following sql_transform:

Akamai logs contains a customField, here we are splitting the custom field into an array using the character ,

Then we take the different value of that array and assign those into new column.

For this value "customField": "XXXXXXX,38458,15169" we’ll create 3 columns:

  • stream_id: XXXXXXX
  • pop: 38458
  • client_asn: 15169

We also created a custom function which extracts specific information from the path of the content:

And with a dictionary file Paramount is able to apply real-time lookups to enrich their incoming data, improving indexing and reducing the need to apply expensive joins at query-time.

Technically we are getting a tuple from the dictionary lookup and we are then extracting from that tuple the different columns.

Some basic SQL mathematical function allows us to calculate the server_ttlb as the sum of (request_end_time_msec + transfer_time_msec)

And finally we are adding the auto_receive_time as the time function now().

Summary

The paramount PoC team estimates that they reduced their infrastructure spend by over 50%, increase their data retention period 6x.

Another Paramount use case simplified their logging pipeline from a complex, multi-step process to a single autoscaling platform, and enabled them to do complex data enrichment in realtime.

In our next blog post we’ll discuss how Paramount’s query and dashboard performance has improved thanks to Hydrolix.

Share Now