Migrating from Rockset? Find out if Hydrolix is right for you >>

Frequently asked questions

Frequently asked questions

Your questions about data ingestion and batch processing answered

GENERAL frequently asked questions

  • Hydrolix is a cloud data platform that combines real-time stream processing, indexed search, and decoupled storage to process terabyte-scale transaction data at 4x less cost than similar systems.

  • Hydrolix is purpose-built to make big data cost-effective to collect, retain, and query. If you are currently working with or planning to work with datasets at volumes of 1 TB/day or greater, then Hydrolix is an excellent choice.

  • Hydrolix can be deployed on Amazon Web Services, Google Cloud Platform, Microsoft Azure, and Akamai Connected Cloud.

  • As a streaming data lake, Hydrolix brings a unique design to the challenges of real-time processing of high volumes of event data. Hydrolix combines stream processing, indexed search, advanced compression techniques and decoupled storage into a single, stateless architecture, a design that delivers a combination of high-performance, longer hot data retention, and low cost.

    The ingest system can handle both event streams (via Kafka, Kinesis, and HTTP streaming) and batch loading.

    The query system scales independently and specific workloads can be partitioned into query pools to avoid resource contention and to allow teams to assign compute resources in accordance to their own specific budget and performance goals.

    Hydrolix uses object storage (rather than block storage) to keep log storage costs low and extend hot data retention windows. Patented compression techniques further reduce your data footprint and log storage costs.

    Hydrolix’s approach to partitioning data plus it’s default of indexing every column, together with customizable query pools, deliver SSD performance at block storage prices.

  • For maximum flexibility, Hydrolix is designed to work out of the box with popular data visualization tools like Grafana, Redash, Superset, and Looker.

  • Here are the estimated costs for running Hydrolix on Amazon EKS, ingesting two terabytes of raw data per day and assuming 12 months of data retention and 15x compression.

    Estimated Monthly Cloud Provider Expenses
    Core Platform: $2,237
    Ingest Capacity:  $1,470
    Query Capacity: $1,960
    Cloud Storage: $1,146

    Cloud Provider Total: $6,814 / mo
    Hydrolix License & Support: $5,840 / mo

    Total Cost of Ownership: $12,654 / mo
    Effective Cost: $0.20 / GB

    You can use this online pricing calculator to get a cost estimate.

  • Hydrolix offers a free trial, allowing developers to experience the platform’s capabilities firsthand.

    Explore Hydrolix’s features, assess its suitability for your projects, and gain insights into its performance and benefits.

  • Yes, there is a SaaS version of Hydrolix. We work with multiple cloud vendors to provide the SaaS version.

    Please get in touch with the Hydrolix sales team, and they will guide you.

  • Yes, Hydrolix is designed to accommodate multi-CDN monitoring seamlessly. You can learn how our client, Paramount, uses Hydrolix for that purpose.

  • Yes. Hydrolix offers several advantages: high-performance querying, cost-efficient data retention, and seamless scalability.

    Hydrolix’s architecture and features make it a compelling choice for organizations seeking improved data management and analytics capabilities beyond what the ELK stack provides.

  • Vendor lock-in is not a concern with Hydrolix, as your data remains within your architecture.

    For cloud deployments, your data resides inside your chosen object storage environment, such as Amazon S3 or Azure Blob Storage.

    This flexibility ensures you can easily migrate your data, offering peace of mind and data sovereignty.

  • Cloud and on-premises offerings all adhere to the following security configurations:

    • Role-based access control (RBAC) to limit access to project data.
    • Strict data separation between projects – Customers can have multiple projects so that you can limit access to teams and individuals as needed.
    • SOC 2 compliance to ensure that all data is stored and processed securely.
    • GDPR compliance
    • TLS for data in transit – To retrieve data from cloud storage, Hydrolix clusters use token-based authentication over TLS

COLLECT frequently asked questions

  • Hydrolix supports streaming data ingestion through the Stream API and batch ingestion through the Batch API. There are also special connectors for Apache Kafka and AWS Kinesis.

    Check out data collection in the Hydrolix documentation.

  • Yes, Hydrolix integrates with Apache Kafka. Hydrolix Projects and Tables can continuously ingest data from one or more Kafka-based streaming sources.

  • Yes, Hydrolix supports AWS Kinesis. You can ingest data into Hydrolix with AWS Kinesis, making it easy to work with your real-time streaming data.

  • Yes, Hydrolix fully supports batch processing. This collection feature allows you to efficiently load data from a storage bucket into a target table, ensuring you can work with your data at scale.

    Hydrolix offers two mechanisms for batch ingestion: the Batch Job API, which handles one-off tasks for loading one or more files based on job configurations, and Batch Auto-Ingest, which continuously ingests new files arriving in a storage bucket.

    Supported data formats include CSV and JSON. Please note that Hydrolix requires read permissions to access external storage buckets for batch ingestion.

QUERY frequently asked questions

  • Hydrolix improves query efficiency compared to other cloud data platforms through its unique architecture. Our decoupled and stateless design separates ingest and query resources from storage, allowing us to focus on efficiently handling high-cardinality and high-dimensionality data.

    Here’s how our architecture achieves query efficiency:

    • Scalable Query Pools: Hydrolix enables you to scale query resources independently, ensuring consistently low-latency queries as your data workload grows.
    • Partition Metadata: We utilize partition metadata to speed up time-based queries, which is particularly beneficial for time-series data analysis.
    • Full Column Indexing: Hydrolix leverages full-column indexing, which optimizes query performance by swiftly locating the necessary data.
    • Predicate Pushdown: Our platform efficiently filters datasets using predicate pushdown, further enhancing query efficiency.
  • Because Hydrolix query infrastructure is decoupled from storage and collection, you can quickly scale query pools up or down. Small query pools give you consistent low-cost queries, while large pools give you consistent low-latency queries.

    You can create separate query pools for different groups of users. For example, you might configure separate sandboxes for administrator, interactive analyst, and monitoring queries.

    Pool groups support independent scaling, so the capacity for each pool can adjust automatically to satisfy demand. You can even scale an entire pool to zero when demand is negligible–for example, over the weekend when staff do not need to access data. When demand returns, you can scale the pool back up within minutes.

  • Hydrolix uses an ANSI-compliant SQL interface. This interface uses the syntax and some of the SQL engine of Clickhouse. All standard features, including how the interface API works for querying data, are supported.

RETAIN frequently asked questions

  • Hydrolix provides a cost-effective data retention solution with patented high-density compression technology. This technology lets you keep all your data online without offloading or sampling. And because of reduced storage costs, you can retain data for analysis, compliance, and security, eliminating the trade-off between data retention and cost.

    You also get the additional benefit of reducing your environmental footprint by reducing the storage infrastructure required to store massive datasets.

  • In the context of Hydrolix, zero-egress means that when you deploy our solution on-premises, you have complete control over your data, and no additional egress costs are incurred.

    This allows you to manage your data efficiently and cost-effectively within your own infrastructure.

  • No, Hydrolix doesn’t distinguish between hot, warm, or cold data. Queries against all data, whether minutes or years old, deliver sub-second performance, ensuring all of your data remains accessible.

    Because Hydrolix combines high-density compression technology with decoupled storage, the cost to deliver low-latency queries on your data, regardless of age, is 4x lower than other databases.

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more