RSS

Translating Elastic Queries to SQL With KETCHUP

Learn about KETCHUP, an open source tool for translating Elastic queries to SQL and using Kibana with observability tools other than Elastic.

Alex Cruise

Franz Knupfer

Published:

Oct 24, 2023

5 minute read
,

Convert Elastic Queries to SQL

Kibana is a powerful open source visualization tool with one major downside—it’s only available to use with Elasticsearch. With the emergence of many other observability data platforms (including Hydrolix), both the Kibana community and ops teams would benefit from being able to use Kibana visualizations with other platforms.

Hydrolix is contributing to the open source community with KETCHUP (Kibana Elastic Translation ClickHouse User Proxy), a tool that makes it easier to plug in and view other data sources, including Hydrolix, in Kibana. While we love working with Grafana and other data visualization solutions, it’ll be great to add Kibana as an option for viewing alternative sources of data.

This post demonstrates how to use the KETCHUP CLI tool to translate several kinds of Elastic queries to SQL. In a future post, you’ll learn how to place a reverse proxy server between Kibana and Elasticsearch—and learn about future directions for the project.

Contribute to KETCHUP open source

Interested in contributing to KETCHUP? Help connect Kibana to alternative data sources like Hydrolix, ClickHouse, and Altinity. To become a contributor and support KETCHUP’s development roadmap, file a GitHub issue on the project or even fork the code!

What is KETCHUP?

KETCHUP is primarily a reverse proxy that emulates Elasticsearch and OpenSearch APIs. Depending on how you configure KETCHUP, you can:

  • Translate queries to SQL and dispatch them to ClickHouse-compatible servers.
  • Pass requests to an underlying Elasticsearch or OpenSearch server.

KETCHUP also includes a CLI query translator.

Here is a diagram that shows the architecture for KETCHUP.

As the diagram shows, the KETCHUP proxy forwards requests from Kibana and OpenSearch dashboards as well as other Elastic clients to ClickHouse using Java Database Connectivity (JDBC). Any request that doesn’t match a forwarding rule is passed through unchanged to an existing Elastic or OpenSearch cluster, most notably the resources and metadata Kibana (or OpenSearch Dashboards) needs in order to operate.

Getting Started With KETCHUP

You’ll need a Java development kit (JDK) installed on your build machine. Versions 11 and up should work fine.

First, clone the repository:

Then run the build:

Using the KETCHUP CLI Query Translator

The KETCHUP CLI Query Translator leverages core capabilities of KETCHUP to translate many Elastic DSL, KQL, and Lucene queries to SQL, but doesn’t have any dependencies or mandatory configuration files. 

Let’s dive right in and take a look at some examples of using the CLI.

Example 1: KQL/Lucene

You can translate KQL/Lucene queries using the KETCHUP CLI’s -q option by running the following on the command line.

You should get this output:

Notes:

  • The original query doesn’t specify a size limit. LIMIT 10 is added by default, just like Elastic.
  • The original query doesn’t select any fields. The idField is selected by default.

Example 2: JSON Query DSL

You can also translate JSON Query DSL documents. Here’s an example. First, the JSON query:

Next, you can run the KETCHUP CLI to convert the JSON to SQL.

--table specifies the table that you are querying, --idField is what the ID field is called in the table, and --index is the name of the Elastic or OpenSearch index the client is expecting to query.

For more information, see the supported command line arguments for the query translator.

You should get this output:

Notes:

  • Elastic allows a single query to return both aggregation and non-aggregation results when size > 0. However, that results in extraordinarily complex SQL or the need to generate and run two queries, then reassemble the results. Instead, KETCHUP assumes that any query that requests aggregations has an implicit size: 0 parameter, which suppresses non-aggregation results.
  • KETCHUP adds SETTINGS enable_positional_arguments = 1 to make the query more readable while remaining compatible with ClickHouse SQL dialect.

Example 3: Nested Fields

Since Elasticsearch is a document-oriented system, nested fields are overwhelmingly common. Hydrolix supports nested fields too, but in real-world deployments, customers often promote a subset of fields to top-level columns, then relegate others to nested map columns. Let’s look at another query example. This query is very similar to the last one, except this example includes nested fields:

Unfortunately, without additional configuration about how users have mapped their nested fields, KETCHUP generates SQL with the paths unchanged, and you’ll get something like this, which will rarely work given typical database schemas:

To correctly translate such queries using nested field references, the KETCHUP query translator needs additional metadata. Elastic Metadata mapping files have an optional nestedFieldsConfig element that controls this behavior as shown in the next example:

Let’s take a closer look at "nestedFieldsConfig", which has three key-value pairs specified:

  • catchAllField is the name of a map column used to store any fields (nested or otherwise) that aren’t mapped to the top level.
  • topLevelFields lists the names of any additional map columns that store collections of fields relevant to a particular service or source.
  • pathSeparator specifies the delimiter used to reference nested fields.

Now you can translate the query with the elasticMetadata.json configuration like this:

Using the nestedFieldsConfig, the translator produces the following SQL, which is more likely to work with typical database schemas:

Notes:

  • geo is listed as a top-level field, so the reference to geo.city is mapped to geo['city'].
  • bar is not listed as a top-level field, so the reference to bar.bar2.bar3 is mapped to catch_all['bar_bar2_bar3']. The elements of the path are joined to a single string using the pathSeparator delimiter, then extracted from the field denoted by catchAllField.
  • Note that you can pass in Elastic metadata with the -e flag.

Summary

In this post, you learned how to use the KETCHUP CLI. In a future post, you’ll learn how to use a reverse proxy server that sits between Kibana and Elasticsearch. This server selectively dispatches user queries to ClickHouse-compatible databases like Hydrolix and allows internal Kibana queries to proceed normally.

Next Steps

Interested in contributing to KETCHUP? Help drive our development roadmap, file a GitHub issue on the project, or even fork the code!

Photo by Roberto Sorin

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

Kibana is a powerful open source visualization tool with one major downside—it’s only available to use with Elasticsearch. With the emergence of many other observability data platforms (including Hydrolix), both the Kibana community and ops teams would benefit from being able to use Kibana visualizations with other platforms.

Hydrolix is contributing to the open source community with KETCHUP (Kibana Elastic Translation ClickHouse User Proxy), a tool that makes it easier to plug in and view other data sources, including Hydrolix, in Kibana. While we love working with Grafana and other data visualization solutions, it’ll be great to add Kibana as an option for viewing alternative sources of data.

This post demonstrates how to use the KETCHUP CLI tool to translate several kinds of Elastic queries to SQL. In a future post, you’ll learn how to place a reverse proxy server between Kibana and Elasticsearch—and learn about future directions for the project.

Contribute to KETCHUP open source

Interested in contributing to KETCHUP? Help connect Kibana to alternative data sources like Hydrolix, ClickHouse, and Altinity. To become a contributor and support KETCHUP’s development roadmap, file a GitHub issue on the project or even fork the code!

What is KETCHUP?

KETCHUP is primarily a reverse proxy that emulates Elasticsearch and OpenSearch APIs. Depending on how you configure KETCHUP, you can:

  • Translate queries to SQL and dispatch them to ClickHouse-compatible servers.
  • Pass requests to an underlying Elasticsearch or OpenSearch server.

KETCHUP also includes a CLI query translator.

Here is a diagram that shows the architecture for KETCHUP.

As the diagram shows, the KETCHUP proxy forwards requests from Kibana and OpenSearch dashboards as well as other Elastic clients to ClickHouse using Java Database Connectivity (JDBC). Any request that doesn’t match a forwarding rule is passed through unchanged to an existing Elastic or OpenSearch cluster, most notably the resources and metadata Kibana (or OpenSearch Dashboards) needs in order to operate.

Getting Started With KETCHUP

You’ll need a Java development kit (JDK) installed on your build machine. Versions 11 and up should work fine.

First, clone the repository:

Then run the build:

Using the KETCHUP CLI Query Translator

The KETCHUP CLI Query Translator leverages core capabilities of KETCHUP to translate many Elastic DSL, KQL, and Lucene queries to SQL, but doesn’t have any dependencies or mandatory configuration files. 

Let’s dive right in and take a look at some examples of using the CLI.

Example 1: KQL/Lucene

You can translate KQL/Lucene queries using the KETCHUP CLI’s -q option by running the following on the command line.

You should get this output:

Notes:

  • The original query doesn’t specify a size limit. LIMIT 10 is added by default, just like Elastic.
  • The original query doesn’t select any fields. The idField is selected by default.

Example 2: JSON Query DSL

You can also translate JSON Query DSL documents. Here’s an example. First, the JSON query:

Next, you can run the KETCHUP CLI to convert the JSON to SQL.

--table specifies the table that you are querying, --idField is what the ID field is called in the table, and --index is the name of the Elastic or OpenSearch index the client is expecting to query.

For more information, see the supported command line arguments for the query translator.

You should get this output:

Notes:

  • Elastic allows a single query to return both aggregation and non-aggregation results when size > 0. However, that results in extraordinarily complex SQL or the need to generate and run two queries, then reassemble the results. Instead, KETCHUP assumes that any query that requests aggregations has an implicit size: 0 parameter, which suppresses non-aggregation results.
  • KETCHUP adds SETTINGS enable_positional_arguments = 1 to make the query more readable while remaining compatible with ClickHouse SQL dialect.

Example 3: Nested Fields

Since Elasticsearch is a document-oriented system, nested fields are overwhelmingly common. Hydrolix supports nested fields too, but in real-world deployments, customers often promote a subset of fields to top-level columns, then relegate others to nested map columns. Let’s look at another query example. This query is very similar to the last one, except this example includes nested fields:

Unfortunately, without additional configuration about how users have mapped their nested fields, KETCHUP generates SQL with the paths unchanged, and you’ll get something like this, which will rarely work given typical database schemas:

To correctly translate such queries using nested field references, the KETCHUP query translator needs additional metadata. Elastic Metadata mapping files have an optional nestedFieldsConfig element that controls this behavior as shown in the next example:

Let’s take a closer look at "nestedFieldsConfig", which has three key-value pairs specified:

  • catchAllField is the name of a map column used to store any fields (nested or otherwise) that aren’t mapped to the top level.
  • topLevelFields lists the names of any additional map columns that store collections of fields relevant to a particular service or source.
  • pathSeparator specifies the delimiter used to reference nested fields.

Now you can translate the query with the elasticMetadata.json configuration like this:

Using the nestedFieldsConfig, the translator produces the following SQL, which is more likely to work with typical database schemas:

Notes:

  • geo is listed as a top-level field, so the reference to geo.city is mapped to geo['city'].
  • bar is not listed as a top-level field, so the reference to bar.bar2.bar3 is mapped to catch_all['bar_bar2_bar3']. The elements of the path are joined to a single string using the pathSeparator delimiter, then extracted from the field denoted by catchAllField.
  • Note that you can pass in Elastic metadata with the -e flag.

Summary

In this post, you learned how to use the KETCHUP CLI. In a future post, you’ll learn how to use a reverse proxy server that sits between Kibana and Elasticsearch. This server selectively dispatches user queries to ClickHouse-compatible databases like Hydrolix and allows internal Kibana queries to proceed normally.

Next Steps

Interested in contributing to KETCHUP? Help drive our development roadmap, file a GitHub issue on the project, or even fork the code!

Photo by Roberto Sorin