Kibana is a powerful open source visualization tool with one major downside—it’s only available to use with Elasticsearch. With the emergence of many other observability data platforms (including Hydrolix), both the Kibana community and ops teams would benefit from being able to use Kibana visualizations with other platforms.
Hydrolix is contributing to the open source community with KETCHUP (Kibana Elastic Translation ClickHouse User Proxy), a tool that makes it easier to plug in and view other data sources, including Hydrolix, in Kibana. While we love working with Grafana and other data visualization solutions, it’ll be great to add Kibana as an option for viewing alternative sources of data.
This post demonstrates how to use the KETCHUP CLI tool to translate several kinds of Elastic queries to SQL. In a future post, you’ll learn how to place a reverse proxy server between Kibana and Elasticsearch—and learn about future directions for the project.
Contribute to KETCHUP open source
Interested in contributing to KETCHUP? Help connect Kibana to alternative data sources like Hydrolix, ClickHouse, and Altinity. To become a contributor and support KETCHUP’s development roadmap, file a GitHub issue on the project or even fork the code!
What is KETCHUP?
KETCHUP is primarily a reverse proxy that emulates Elasticsearch and OpenSearch APIs. Depending on how you configure KETCHUP, you can:
- Translate queries to SQL and dispatch them to ClickHouse-compatible servers.
- Pass requests to an underlying Elasticsearch or OpenSearch server.
KETCHUP also includes a CLI query translator.
Here is a diagram that shows the architecture for KETCHUP.

As the diagram shows, the KETCHUP proxy forwards requests from Kibana and OpenSearch dashboards as well as other Elastic clients to ClickHouse using Java Database Connectivity (JDBC). Any request that doesn’t match a forwarding rule is passed through unchanged to an existing Elastic or OpenSearch cluster, most notably the resources and metadata Kibana (or OpenSearch Dashboards) needs in order to operate.
Getting Started With KETCHUP
You’ll need a Java development kit (JDK) installed on your build machine. Versions 11 and up should work fine.
First, clone the repository:
1 | git clone https://github.com/hydrolix/ketchup.git |
Then run the build:
1 2 | cd ketchup ./gradlew clean build |
Using the KETCHUP CLI Query Translator
The KETCHUP CLI Query Translator leverages core capabilities of KETCHUP to translate many Elastic DSL, KQL, and Lucene queries to SQL, but doesn’t have any dependencies or mandatory configuration files.
Let’s dive right in and take a look at some examples of using the CLI.
Example 1: KQL/Lucene
You can translate KQL/Lucene queries using the KETCHUP CLI’s -q option by running the following on the command line.
1 2 3 4 5 6 7 8 | java –jar cli–query–translator/build/libs/cli–query–translator–all.jar \ —table my_table \ —idField id \ —index my_index \ –q “foo:bar AND baz:quux*” \ –t time \ –min 2023–01–01T00:00:00.000Z \ –max 2023–01–31T23:59:59.999Z |
You should get this output:
1 2 3 4 5 | SELECT “id” FROM my_table WHERE (((“foo” = ‘bar’) AND (“baz” LIKE ‘quux%’)) AND (“time” >= parseDateTime64BestEffort(‘2023-01-01T00:00:00Z’) AND “time” <= parseDateTime64BestEffort(‘2023-01-31T23:59:59.999Z’))) LIMIT 10 |
Notes:
- The original query doesn’t specify a size limit. LIMIT 10 is added by default, just like Elastic.
- The original query doesn’t select any fields. The
idField
is selected by default.
Example 2: JSON Query DSL
You can also translate JSON Query DSL documents. Here’s an example. First, the JSON query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | { “aggs”: { “0”: { “date_histogram”: { “field”: “time”, “fixed_interval”: “12h”, “time_zone”: “America/Vancouver” }, “aggs”: { “1”: { “min”: { “field”: “Age” } }, “2”: { “avg”: { “field”: “Age” } }, “3”: { “max”: { “field”: “Age” } } } } }, “query”: { “query_string”: { “default_field”: “city”, “query”: “(new york city) OR (big apple)” } } } |
Next, you can run the KETCHUP CLI to convert the JSON to SQL.
1 2 3 4 5 | java –jar cli–query–translator/build/libs/cli–query–translator–all.jar \ —table my_table \ —idField id \ —index my_index \ query.json |
--table
specifies the table that you are querying, --idField
is what the ID field is called in the table, and --index
is the name of the Elastic or OpenSearch index the client is expecting to query.
For more information, see the supported command line arguments for the query translator.
You should get this output:
1 2 3 4 5 6 7 8 9 10 11 | SELECT COUNT(*) AS count, toStartOfInterval(“time”, INTERVAL 12 hour, ‘America/Vancouver’) AS time, MIN(“Age”) AS min_Age, AVG(“Age”) AS avg_Age, MAX(“Age”) AS max_Age FROM my_table WHERE ((“city” = ‘new york city’) OR (“city” = ‘big apple’)) GROUP BY 2 ORDER BY 2 SETTINGS enable_positional_arguments = 1 |
Notes:
- Elastic allows a single query to return both aggregation and non-aggregation results when
size > 0
. However, that results in extraordinarily complex SQL or the need to generate and run two queries, then reassemble the results. Instead, KETCHUP assumes that any query that requests aggregations has an implicitsize: 0
parameter, which suppresses non-aggregation results. - KETCHUP adds
SETTINGS enable_positional_arguments = 1
to make the query more readable while remaining compatible with ClickHouse SQL dialect.
Example 3: Nested Fields
Since Elasticsearch is a document-oriented system, nested fields are overwhelmingly common. Hydrolix supports nested fields too, but in real-world deployments, customers often promote a subset of fields to top-level columns, then relegate others to nested map columns. Let’s look at another query example. This query is very similar to the last one, except this example includes nested fields:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | { “aggs”: { “0”: { “date_histogram”: { “field”: “time”, “fixed_interval”: “12h”, “time_zone”: “America/Vancouver” }, “aggs”: { “1”: { “min”: { “field”: “Age” } }, “2”: { “avg”: { “field”: “Age” } }, “3”: { “max”: { “field”: “Age” } } } } }, “query”: { “query_string”: { “default_field”: “geo.city”, “query”: “(new york city) OR (big apple) OR bar.bar2.bar3:foo” } } } |
Unfortunately, without additional configuration about how users have mapped their nested fields, KETCHUP generates SQL with the paths unchanged, and you’ll get something like this, which will rarely work given typical database schemas:
1 | WHERE ((“geo.city” = ‘new york city’) OR (“geo.city” = ‘big apple’) OR (“bar.bar2.bar3” = ‘foo’)) |
To correctly translate such queries using nested field references, the KETCHUP query translator needs additional metadata. Elastic Metadata mapping files have an optional nestedFieldsConfig
element that controls this behavior as shown in the next example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | { “indexName”: “my_index”, “indexAliases”: [], “primaryKeyField”: “id”, “timestampField”: “time”, “nestedFieldsConfig”: { “catchAllField”: “catch_all”, “topLevelFields”: [ “geo” ], “pathSeparator”: “_” }, “columnAliases”: [], “columnMappings”: [ { “jdbcType”: “VARCHAR”, “path”: [ “geo”, “city” ], “elasticType”: “Keyword”, “columnInfo”: null }, { “jdbcType”: “VARCHAR”, “path”: [ “bar”, “bar2”, “bar3” ], “elasticType”: “Keyword”, “columnInfo”: null } ] } |
Let’s take a closer look at "nestedFieldsConfig"
, which has three key-value pairs specified:
1 2 3 4 5 6 7 | “nestedFieldsConfig”: { “catchAllField”: “catch_all”, “topLevelFields”: [ “geo” ], “pathSeparator”: “_” }, |
catchAllField
is the name of amap
column used to store any fields (nested or otherwise) that aren’t mapped to the top level.topLevelFields
lists the names of any additionalmap
columns that store collections of fields relevant to a particular service or source.pathSeparator
specifies the delimiter used to reference nested fields.
Now you can translate the query with the elasticMetadata.json
configuration like this:
1 2 3 4 5 6 | java –jar cli–query–translator/build/libs/cli–query–translator–all.jar \ —table my_table \ —idField id \ —index my_index \ –e elasticMetadata.json \ query.json |
Using the nestedFieldsConfig
, the translator produces the following SQL, which is more likely to work with typical database schemas:
1 | WHERE ((geo[‘city’] = ‘new york city’) OR (geo[‘city’] = ‘big apple’) OR (catch_all[‘bar_bar2_bar3’] = ‘foo’)) |
Notes:
geo
is listed as a top-level field, so the reference togeo.city
is mapped togeo['city']
.bar
is not listed as a top-level field, so the reference tobar.bar2.bar3
is mapped tocatch_all['bar_bar2_bar3']
. The elements of the path are joined to a single string using thepathSeparator
delimiter, then extracted from the field denoted bycatchAllField
.- Note that you can pass in Elastic metadata with the
-e
flag.
Summary
In this post, you learned how to use the KETCHUP CLI. In a future post, you’ll learn how to use a reverse proxy server that sits between Kibana and Elasticsearch. This server selectively dispatches user queries to ClickHouse-compatible databases like Hydrolix and allows internal Kibana queries to proceed normally.
Next Steps
Interested in contributing to KETCHUP? Help drive our development roadmap, file a GitHub issue on the project, or even fork the code!
Photo by Roberto Sorin