Added in 6.7.

Kerberos support for Elasticsearch for Apache Hadoop requires Elasticsearch 6.7 or greater

Securing Hadoop means using Kerberos. Elasticsearch supports Kerberos as an authentication method. While the use of Kerberos is not required for securing Elasticsearch, it is a convenient option for those who already deploy Kerberos to secure their Hadoop clusters. This chapter aims to explain the steps needed to set up elasticsearch-hadoop to use Kerberos authentication for Elasticsearch.

Elasticsearch for Apache Hadoop communicates with Elasticsearch entirely over HTTP. In order to support Kerberos authentication over HTTP, elasticsearch-hadoop uses the Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) to negotiate which underlying authentication method to use (in this case, Kerberos) and to transmit the agreed upon credentials to the server. This authentication mechanism is performed using the HTTP Negotiate authentication standard, where a request is sent to the server and a response is received back with a payload that further advances the negotiation. Once the negotiation between the client and server is complete, the request is accepted and a successful response is returned.

Elasticsearch for Apache Hadoop makes use of Hadoop’s user management processes; The Kerberos credentials of the current Hadoop user are used when authenticating to Elasticsearch. This means that Kerberos authentication in Hadoop must be enabled in order for elasticsearch-hadoop to obtain a user’s Kerberos credentials. In the case of using an integration that does not depend on Hadoop’s runtime (e.g. Storm), additional steps may be required to ensure that a running process has Kerberos credentials available for authentication. It is recommended that you consult the documentation of each framework that you are using on how to configure security.

Setting up your environmentedit

This documentation assumes that you have already provisioned a Hadoop cluster with Kerberos authentication enabled (required). The general process of deploying Kerberos and securing Hadoop is beyond the scope of this documentation.

Before starting, you will need to ensure that principals for your users are provisioned in your Kerberos deployment, as well as service principals for each Elasticsearch node. To enable Kerberos authentication on Elasticsearch, it must be configured with a Kerberos realm. It is recommended that you familiarize yourself with how to configure Elasticsearch Kerberos realms so that you can make appropriate adjustments to fit your deployment. You can find more information on how they work in the Elastic Stack documentation.

Additionally, you will need to configure the API Key Realm in Elasticsearch. Hadoop and other distributed data processing frameworks only authenticate with Kerberos in the process that launches a job. Once a job has been launched, the worker processes are often cut off from the original Kerberos credentials and need some other form of authentication. Hadoop services often provide mechanisms for obtaining Delegation Tokens during job submission. These tokens are then distributed to worker processes which use the tokens to authenticate on behalf of the user running the job. Elasticsearch for Apache Hadoop obtains API Keys in order to provide tokens for worker processes to authenticate with.

Connector Settingsedit

The following settings are used to configure elasticsearch-hadoop to use Kerberos authentication: (default simple, or basic if is set)
Required. Similar to most Hadoop integrations, this property signals which method to use in order to authenticate with Elasticsearch. By default, the value is simple, unless is set, in which case it will default to basic. The available options for this setting are simple for no authentication, basic for basic http authentication, pki if relying on certificates, and kerberos if Kerberos authentication over SPNEGO should be used. (default none)
Required if is set to kerberos. Details the name of the service principal that the Elasticsearch server is running as. This will usually be of the form HTTP/node.address@REALM. Since Elasticsearch is distributed and should be using a service principal per node, you can use the _HOST pattern (like so HTTP/_HOST@REALM) to have elasticsearch-hadoop substitute the address of the node it is communicating with at runtime. Note that elasticsearch-hadoop will attempt to reverse resolve node IP addresses to hostnames in order to perform this substitution. (default false)
Optional. The SPNEGO mechanism assumes that authentication may take multiple back and forth request-response cycles for a request to be fully accepted by the server. When a request is finally accepted by the server, the response contains a payload that can be verified to ensure that the server is the principal they say they are. Setting this to true instructs elasticsearch-hadoop to perform this mutual authentication, and to fail the response if it detects invalid credentials from the server.

Kerberos on Hadoopedit


Before using Kerberos authentication to Elasticsearch, Kerberos authentication must be enabled in Hadoop.

Configure elasticsearch-hadoopedit

Elasticsearch for Apache Hadoop only needs a few settings to configure Kerberos authentication. It is best to set these properties in your core-site.xml configuration so that they can be obtained across your entire Hadoop deployment, just like you would for turning on security options for services in Hadoop.


Kerberos on YARNedit

When applications launch on a YARN cluster, they send along all of their application credentials to the Resource Manager process for them to be distributed to the containers. The Resource Manager has the ability to renew any tokens in those credentials that are about to expire and to cancel tokens once a job has completed. The tokens from Elasticsearch have a default lifespan of 7 days and they are not renewable. It is a best practice to configure YARN so that it is able to cancel those tokens at the end of a run in order to lower the risk of unauthorized use, and to lower the amount of bookkeeping Elasticsearch must perform to maintain them.

In order to configure YARN to allow it to cancel Elasticsearch tokens at the end of a run, you must add the elasticsearch-hadoop jar to the Resource Manager’s classpath. You can do that by placing the jar on the Resource Manager’s local filesystem, and setting the path to the jar in the YARN_USER_CLASSPATH environment variable. Once the jar is added, the Resource Manager will need to be restarted.

export YARN_USER_CLASSPATH=/path/to/elasticsearch-hadoop.jar

Additionally, the connection information for elasticsearch-hadoop should be present in the Hadoop configuration, preferably the core-site.xml. This is because when Resource Manager cancels a token, it does not take the job configuration into account. Without the connection settings in the Hadoop configuration, the Resource Manager will not be able to communicate to Elasticsearch in order to cancel the token.

Here is a few common security properties that you will need in order for the Resource Manager to contact Elasticsearch to cancel tokens:


The addresses of some Elasticsearch nodes. These can be any nodes (or all of them) as long as they all belong to the same cluster.

Authentication must be configured as kerberos in the settings.

The name of the Elasticsearch service principal is not required for token cancellation but having the property in the core-site.xml is required for some integrations like Spark.

SSL should be enabled if you are using a secured Elasticsearch deployment.

Location on the local filesystem to reach the SSL Keystore.

Location on the local filesystem to reach the SSL Truststore.

Location on the local filesystem to reach the elasticsearch-hadoop secure store for secure settings.

Kerberos with Map/Reduceedit

Before launching your Map/Reduce job, you must add a delegation token for Elasticsearch to the job’s credential set. The EsMapReduceUtil utility class can be used to do this for you. Simply pass your job to it before submitting it to the cluster. Using the local Kerberos credentials, the utility will establish a connection to Elasticsearch, request an API Key, and stow the key in the job’s credential set for the worker processes to use.

Job job = Job.getInstance(getConf(), "My-Job-Name"); 

// Configure Job Here...


if (!job.waitForCompletion(true)) { 
    return 1;

Creating a new job instance

EsMapReduceUtil obtains job delegation tokens for Elasticsearch

Submit the job to the cluster

You can obtain the job delegation tokens at any time during the configuration of the Job object, as long as your elasticsearch-hadoop specific configurations are set. It’s usually sufficient to do it right before submitting the job. You should only do this once per job since each call will wastefully obtain another API Key.

Additionally, the utility is also compatible with the mapred API classes:

JobConf jobConf = new JobConf(getConf()); 

// Configure JobConf Here...



Creating a new job configuration

Obtain Elasticsearch delegation tokens

Submit the job to the cluster

Kerberos with Hiveedit


Using Kerberos auth on Elasticsearch is only supported using HiveServer2.

Before using Kerberos authentication to Elasticsearch in Hive, Kerberos authentication must be enabled for Hadoop. Make sure you have done all the required steps for configuring your Hadoop cluster as well as the steps for configuring your YARN services before using Kerberos authentication for Elasticsearch.

Finally, ensure that Hive Security is enabled.

Since Hive relies on user impersonation in Elasticsearch it is advised that you familiarise yourself with Elasticsearch authentication and authorization.

Configure user impersonation settings for Hiveedit

Hive’s security model follows a proxy-based approach. When a client submits a query to a secured Hive server, Hive authenticates the client using Kerberos. Once Hive is sure of the client’s identity, it wraps its own identity with a proxy user. The proxy user contains the client’s simple user name, but contains no credentials. Instead, it is expected that all interactions are executed as the Hive principal impersonating the client user. This is why when configuring Hive security, one must specify in the Hadoop configuration which users Hive is allowed to impersonate:


Elasticsearch supports user impersonation, but only users from certain realm implementations can be impersonated. Most deployments of Kerberos include other identity management components like LDAP or Active Directory. In those cases, you can configure those realms in Elasticsearch to allow for user impersonation.

If you are only using Kerberos, or you are using a solution for which Elasticsearch does not support user impersonation, you must mirror your Kerberos principals to either a native realm or a file realm in Elasticsearch. When mirroring a Kerberos principal to one of these realms, set the new user’s username to just the main part of the principal name, without any realm or host information. For instance, client@REALM would just be client and someservice/ would just be someservice.

You can follow this step by step process for mirroring users:

Create End User Rolesedit

Create a role for your end users that will be querying Hive. In this example, we will make a simple role for accessing indices that match hive-index-*. All our Hive users will end up using this role to read, write, and update indices in Elasticsearch.

PUT _security/role/hive_user_role 
  "run_as": [],
  "cluster": ["monitor", "manage_token"], 
  "indices": [
        "names": [ "hive-index-*" ],  
        "privileges": [ "read", "write", "manage" ]

Our example role name is hive_user_role.

User should be able to query basic cluster information and manage tokens.

Our user will be able to perform read write and management operations on an index.

Create role mapping for Kerberos user principaledit

Now that the user role is created, we must map the Kerberos user principals to the role. Elasticsearch does not know the complete list of principals that are managed by Kerberos. As such, each principal that wishes to connect to Elasticsearch must be mapped to a list of roles that they will be granted after authentication.

POST /_security/role_mapping/hive_user_1_mapping
  "roles": [ "hive_user_role" ], 
  "enabled": true,
  "rules": {
    "field" : { "username" : "hive.user.1@REALM" } 

We set the roles for this mapping to be our example role hive_user_role.

When the user principal hive.user.1@REALM authenticates, it will be given the permissions from the hive_user_role.

Mirror the user to the native realmedit

You may not have to perform this step if you are deploying LDAP or Active Directory along with Kerberos. Elasticsearch will perform user impersonation by looking up the user names in those realms as long as the simple names (e.g. hive.user.1) on the Kerberos principals match the user names LDAP or Active Directory exactly.

Mirroring the user to the native realm will allow Elasticsearch to accept authentication requests from the original principal as well as accept requests from Hive which is impersonating the user. You can create a user in the native realm like so:

PUT /_security/user/hive.user.1 
  "enabled" : true,
  "password" : "swordfish", 
  "roles" : [ "hive_user_role" ], 
  "metadata" : {
    "principal" : "hive.user.1@REALM" 

The user name is hive.user.1, which is the simple name format of the hive.user.1@REALM principal we are mirroring.

Provide a password here for the user. This should ideally be a securely generated random password since this mirrored user is just for impersonation purposes.

Setting the user’s roles to be the example role hive_user_role.

This is not required, but setting the original principal on the user as metadata may be helpful for your own bookkeeping.

Create a role to impersonate Hive usersedit

Once you have configured Elasticsearch with a role mapping for your Kerberos principals and native users for impersonation, you must create a role that Hive will use to impersonate those users.

PUT _security/role/hive_proxier
  "run_as": ["hive.user.1"] 

Hive’s proxy role should be limited to only run as the users who will be using Hive.

Create role mapping for Hive’s service principaledit

Now that there are users to impersonate, and a role that can impersonate them, make sure to map the Hive principal to the proxier role, as well as any of the roles that the users it is impersonating would have. This allows the Hive principal to create and read indices, documents, or do anything else its impersonated users might be able to do. While Hive is impersonating the user, it must have these roles or else it will not be able to fully impersonate that user.

POST /_security/role_mapping/hive_hiveserver2_mapping
  "roles": [
  "enabled": true,
  "rules": {
    "field" : { "username" : "hive/hiveserver2.address@REALM" } 

Here we set the roles to be the superset of the roles from the users we want to impersonate. In our example, the hive_user_role role is set.

The role that allows Hive to impersonate Hive end users.

The name of the Hive server principal to match against.

If managing Kerberos role mappings via the API’s is not desired, they can instead be managed in a role mapping file.

Running your Hive queriesedit

Once all user accounts are configured and all previous steps for enabling Kerberos auth in Hadoop and Hive are complete, there should be no differences in creating Hive queries from before.

Kerberos with Pigedit


Before using Kerberos authentication to Elasticsearch in Pig, Kerberos authentication must be enabled for Hadoop. Make sure you have done all the required steps for configuring your Hadoop cluster as well as the steps for configuring your YARN services before using Kerberos authentication for Elasticsearch.

If elasticsearch-hadoop is configured for Kerberos authentication and Hadoop security is enabled, elasticsearch-hadoop’s storage functions in Pig will automatically obtain delegation tokens for jobs when submitting them to the cluster.

Kerberos with Sparkedit


Using Kerberos authentication in elasticsearch-hadoop for Spark has the following requirements:

  1. Your Spark jobs must be deployed on YARN. Using Kerberos authentication in elasticsearch-hadoop does not support any other Spark cluster deployments (Mesos, Standalone).
  2. Your version of Spark must be on or above version 2.1.0. It is this version that Spark added the ability to plug in third-party credential providers to obtain delegation tokens.

Before using Kerberos authentication to Elasticsearch in Spark, Kerberos authentication must be enabled for Hadoop. Make sure you have done all the required steps for configuring your Hadoop cluster as well as the steps for configuring your YARN services before using Kerberos authentication for Elasticsearch.


Before Spark submits an application to a YARN cluster, it loads a number of credential provider implementations that are used to determine if any additional credentials must be obtained before the application is started. These implementations are loaded using Java’s ServiceLoader architecture. Thus, any jar that is on the classpath when the Spark application is submitted can offer implementations to be loaded and used. EsServiceCredentialProvider is one such implementation that is loaded whenever elasticsearch-hadoop is on the job’s classpath.

Once loaded, EsServiceCredentialProvider determines if Kerberos authentication is enabled for elasticsearch-hadoop. If it is determined that Kerberos authentication is enabled for elasticsearch-hadoop, then the credential provider will automatically obtain delegation tokens from Elasticsearch and add them to the credentials on the YARN application submission context. Additionally, in the case that the job is a long lived process like a Spark Streaming job, the credential provider is used to update or obtain new delegation tokens when the current tokens approach their expiration time.

The time that Spark’s credential providers are loaded and called depends on the cluster deploy mode when submitting your Spark app. When running in client deploy mode, Spark runs the user’s driver code in the local JVM, and launches the YARN application to oversee the processing as needed. The providers are loaded and run whenever the YARN application first comes online. When running in cluster deploy mode, Spark launches the YARN application immediately, and the user’s driver code is run from the resulting Application Master in YARN. The providers are loaded and run immediately, before any user code is executed.

Configuring the credential provideredit

All implementations of the Spark credential providers use settings from only a few places:

  1. The entries from the local Hadoop configuration files
  2. The entries of the local Spark configuration file
  3. The entries that are specified from the command line when the job is initially launched

Settings that are configured from the user code are not used because the provider must run once for all jobs that are submitted for a particular Spark application. User code is not guaranteed to be run before the provider is loaded. To make things more complicated, a credential provider is only given the local Hadoop configuration to determine if they should load delegation tokens.

These limitations mean that the settings to configure elasticsearch-hadoop for Kerberos authentication need to be in specific places:

First, MUST be set in the local Hadoop configuration files as kerberos. If it is not set in the Hadoop configurations, then the credential provider will assume that simple authentication is to be used, and will not obtain delegation tokens.

Secondly, all general connection settings for elasticsearch-hadoop (like es.nodes, es.ssl.enabled, etc…​) must be specified either in the local Hadoop configuration files, in the local Spark configuration file, or from the command line. If these settings are not available here, then the credential provider will not be able to contact Elasticsearch in order to obtain the delegation tokens that it requires.

$> bin/spark-submit \
    --class org.myproject.MyClass \
    --master yarn \
    --deploy-mode cluster \
    --jars path/to/elasticsearch-hadoop.jar \
    --conf ',es-node-2,es-node-3' 
    --conf ''
    --conf '' 

An example of some connection settings specified at submission time

Be sure to include the Elasticsearch service principal.

Specifying this many configurations in the spark-submit command line is a pretty sure fire way to miss important settings. Thus, it is advised to set them in the cluster wide Hadoop config.

Renewing credentials for streaming jobsedit

In the event that you are running a streaming job, it is best to use the cluster deploy mode to allow YARN to manage running the driver code for the streaming application.

Since streaming jobs are expected to run continuously without stopping, you should configure Spark so that the credential provider can obtain new tokens before the original tokens expire.

Configuring Spark to obtain new tokens is different from configuring YARN to renew and cancel tokens. YARN can only renew existing tokens up to their maximum lifetime. Tokens from Elasticsearch are not renewable. Instead, they have a simple lifetime of 7 days. After those 7 days elapse, the tokens are expired. In order for an ongoing streaming job to continue running without interruption, completely new tokens must be obtained and sent to worker tasks. Spark has facilities for automatically obtaining and distributing completely new tokens once the original token lifetime has ended.

When submitting a Spark application on YARN, users can provide a principal and keytab file to the spark-submit command. Spark will log in with these credentials instead of depending on the local Kerberos TGT Cache for the current user. In the event that any delegation tokens are close to expiring, the loaded credential providers are given the chance to obtain new tokens using the given principal and keytab before the current tokens fully expire. Any new tokens are automatically distributed by Spark to the containers on the YARN cluster.

$> bin/spark-submit \
    --class org.myproject.MyClass \
    --master yarn \ 
    --deploy-mode cluster \ 
    --jars path/to/elasticsearch-hadoop.jar \
    --principal client@REALM 
    --keytab path/to/keytab.kt \ 

YARN deployment is required for Kerberos

Use cluster deploy mode to allow for the driver to be run in the YARN Application Master

Specify the principal to run the job as

The path to the keytab that will be used to reauthenticate when credentials expire

Disabling the credential provideredit

When elasticsearch-hadoop is on the classpath, EsServiceCredentialProvider is ALWAYS loaded by Spark. If Kerberos authentication is enabled for elasticsearch-hadoop in the local Hadoop configuration, then the provider will attempt to load delegation tokens for Elasticsearch regardless of if they are needed for that particular job.

It is advised that you do not add elasticsearch-hadoop libraries to jobs that are not configured to connect to or interact with Elasticsearch. This is the easiest way to avoid the confusion of unrelated jobs failing to launch because they cannot connect to Elasticsearch.

If you find yourself in a place where you cannot easily remove elasticsearch-hadoop from the classpath of jobs that do not need to interact with Elasticsearch, then you can explicitly disable the credential provider by setting a property at launch time. The property to set is dependent on your version of Spark:

  • For Spark 2.3.0 and up: set the property to false.
  • For Spark 2.1.0-2.3.0: set the property to false. This property is still accepted in Spark 2.3.0+, but is marked as deprecated.

Kerberos with Stormedit


Your Storm deployment should be secured, but configuring it for security is not strictly required.

Storm is not always deployed alongside a Hadoop distribution. Thus, configuring Kerberos authentication for Hadoop is not required for using Kerberos authentication to Elasticsearch on Storm.

Using Storm’s AutoCredential pluginsedit

Storm provides a myriad of plugin interfaces that can be loaded and used to collect, update, and renew credentials over the lifetime of a running topology. elasticsearch-hadoop provides the AutoElasticsearch class which Storm can use to automatically obtain and renew Elasticsearch delegation tokens for a topology.

AutoElasticsearch implements Storm’s INimbusCredentialPlugin, IAutoCredentials, and ICredentialsRenewer interfaces. The first of which is used to obtain delegation tokens on Nimbus before submitting a topology. The second is used for updating the credentials on the worker nodes, and the third is used for obtaining new delegation tokens when the current tokens are close to expiring.

Configuring AutoElasticsearchedit

In order for the AutoElasticsearch plugin to obtain credentials, Kerberos authentication must be enabled for elasticsearch-hadoop in its settings. You must specify the setting in either the storm.yaml file or on the topology configuration.

The AutoElasticsearch plugin provides two settings for denoting the principal and keytab to be used when executing:

es.storm.autocredentials.user.principal (default none)
Required. The principal that the plugin should use for obtaining credentials for this topology. Can be set in the storm.yaml configuration or in the topology configuration.
es.storm.autocredentials.user.keytab (default none)
Required. The path to the keytab on Nimbus that will be used for logging in as the given principal. This can be set in the storm.yaml configuration or in the topology configuration. The file must exist on Nimbus.

Configuring Nimbusedit

Nimbus must be configured to use AutoElasticsearch as a credential plugin from the storm.yaml configuration file. It is safe to specify AutoElasticsearch in these settings even if your topology does not interact with Elasticsearch. The plugin will perform no operations unless AutoElasticsearch is explicitly enabled on the topology.

nimbus.autocredential.plugins.classes: [""] 
nimbus.credential.renewers.classes: [""] 
nimbus.credential.renewers.freq.secs: 30 

The list of auto credential plugins to be run on Nimbus when submitting a topology

The list of all the credential renewers available for Nimbus to run

The frequency at which the credential renewers on Nimbus should be executed to check and update credentials.

In order for the plugin to be loaded, elasticsearch-hadoop must be present on the Nimbus classpath. You can add it to the classpath by using an environment variable on Nimbus.

export STORM_EXT_CLASSPATH=/path/to/elasticsearch-hadoop.jar

Configuring topologiesedit

Once Nimbus is configured, you must add AutoElasticsearch to your topology configuration in order for delegation tokens to be obtained and updated. If you do not specify it in the topology configuration, then Storm will not attempt to obtain Elasticsearch delegation tokens when the topology is submitted.

Config conf = new Config();
List plugins = new ArrayList();
conf.put(Config.TOPOLOGY_AUTO_CREDENTIALS, plugins); 
conf.put(ConfigurationOptions.ES_SECURITY_AUTHENTICATION, "kerberos"); 
conf.put(ConfigurationOptions.ES_NET_SPNEGO_AUTH_ELASTICSEARCH_PRINCIPAL, "HTTP/elasticsearch.node.address@REALM");

Configure the topology with AutoElasticsearch as one of the auto credential plugins for the topology. This list of plugins may contain other auto credential plugins if you have need of them.

If you have not enabled Kerberos authentication for elasticsearch-hadoop in the storm.yaml configuration file, you will need to set the properties here.