RSS

MaxMind Geo Lookup

How to setup Hydrolix geo-lookup using MaxMind database using custom dictionary and function to simplify ingest and query.

David Sztykman

Published:

Jan 30, 2023

3 minute read
,

In this post, we’ll learn how to run MaxMind Geo Lookups directly on raw data as it is ingested and add these lookups as columns to our Hydrolix Table.
We’ll leverage Hydrolix Dictionaries (for in memory lookups), Hydrolix Data Enrichment, and custom functions to achieve this.

Install hdxcli tool

To load dictionary file and their associated configuration the fastest solution is to leverage our CLI tool:

https://github.com/hydrolix/hdxcli

You can use the following command to install the latest version:

Once you have install hdxcli tool you can create a profile ~/.hdx_cli/config.toml the configuration file contains section like the following:

The configuration can contains multiple profile environment, if nothing is specified the CLI will use the default one. To use the prod environment you can use hdxcli --profile prod

For this post we are creating a new project called maxmind where we’ll put all dictionary and function.

Load maxmind files

There are 2 main steps required to setup geolookup, the first step is to actually load the CSV files into Hydrolix cluster.

The default CSV files GeoLite2-City-Locations-en.csv doesn’t load in Hydrolix because of single quotes in the CSV file, you can create a new file in python with double quote everywhere:

After fixing the CSV file you can start loading those into Hydrolix.

The second step is to actually create dictionary definition for those, this is the example for the geoip_country_locations_en.json:

All the dictionary definition are available in our github:

https://github.com/hydrolix/transforms/tree/dev/MaxMind/dictionary

You can checkout this repository and then use the following command to create the dictionary into Hydrolix:

You should now be able to do SQL query using the MaxMind Geo ip dictionary freshly loaded into Hydrolix:

Simplify your query with function

As you can see above the query can be complex, and here we don’t even deal with IPv6 and IPv4 which are using different dictionary files.

This is an example which is looking if the IP is v4 then load the proper dictionary maxmind_geoip_city_blocks_ipv4 and if it’s v6 then maxmind_geoip_city_blocks_ipv6.

From the dictionary we load the geoname_id use for subsequent lookup.

All the functions are available into our github:

https://github.com/hydrolix/transforms/tree/dev/MaxMind/function

And to load those function with hdxcli you can use:

This allow you to simplify the query to something like the following:

Adding GeoLookup at Ingest

Hydrolix allows data enrichment at ingest and we can setup geo lookup here.

This is an example you can use in transform SQL where the lookup is done on the column cliIP:

The transform needs to include the virtual column:

The data will now contain the new column:

  • cliIP_asn -> AS number
  • cliIP_org -> AS name
  • cliIP_country -> Country
  • cliIP_state -> State
  • cliIP_city -> City

Obviously this is an example and you can rename the column to whatever you like!

Share this post…

Ready to Start?

Cut data retention costs by 75%

Give Hydrolix a try or get in touch with us to learn more

In this post, we’ll learn how to run MaxMind Geo Lookups directly on raw data as it is ingested and add these lookups as columns to our Hydrolix Table.
We’ll leverage Hydrolix Dictionaries (for in memory lookups), Hydrolix Data Enrichment, and custom functions to achieve this.

Install hdxcli tool

To load dictionary file and their associated configuration the fastest solution is to leverage our CLI tool:

https://github.com/hydrolix/hdxcli

You can use the following command to install the latest version:

Once you have install hdxcli tool you can create a profile ~/.hdx_cli/config.toml the configuration file contains section like the following:

The configuration can contains multiple profile environment, if nothing is specified the CLI will use the default one. To use the prod environment you can use hdxcli --profile prod

For this post we are creating a new project called maxmind where we’ll put all dictionary and function.

Load maxmind files

There are 2 main steps required to setup geolookup, the first step is to actually load the CSV files into Hydrolix cluster.

The default CSV files GeoLite2-City-Locations-en.csv doesn’t load in Hydrolix because of single quotes in the CSV file, you can create a new file in python with double quote everywhere:

After fixing the CSV file you can start loading those into Hydrolix.

The second step is to actually create dictionary definition for those, this is the example for the geoip_country_locations_en.json:

All the dictionary definition are available in our github:

https://github.com/hydrolix/transforms/tree/dev/MaxMind/dictionary

You can checkout this repository and then use the following command to create the dictionary into Hydrolix:

You should now be able to do SQL query using the MaxMind Geo ip dictionary freshly loaded into Hydrolix:

Simplify your query with function

As you can see above the query can be complex, and here we don’t even deal with IPv6 and IPv4 which are using different dictionary files.

This is an example which is looking if the IP is v4 then load the proper dictionary maxmind_geoip_city_blocks_ipv4 and if it’s v6 then maxmind_geoip_city_blocks_ipv6.

From the dictionary we load the geoname_id use for subsequent lookup.

All the functions are available into our github:

https://github.com/hydrolix/transforms/tree/dev/MaxMind/function

And to load those function with hdxcli you can use:

This allow you to simplify the query to something like the following:

Adding GeoLookup at Ingest

Hydrolix allows data enrichment at ingest and we can setup geo lookup here.

This is an example you can use in transform SQL where the lookup is done on the column cliIP:

The transform needs to include the virtual column:

The data will now contain the new column:

  • cliIP_asn -> AS number
  • cliIP_org -> AS name
  • cliIP_country -> Country
  • cliIP_state -> State
  • cliIP_city -> City

Obviously this is an example and you can rename the column to whatever you like!