The Search AI Company

Build tailored experiences with Elastic.

Elastic Search AI Platform overview

Scale your business with Elastic Partners

Partner overview

ELK Stack

Search and analytics, data ingestion, and visualization – all at your fingertips.

ELK Stack overview

By developers, for developers

Elastic Cloud

Unlock the power of real-time insights with Elastic on your preferred cloud provider.

Elastic Cloud overview

Generative AI

Prototype and integrate with LLMs faster using search AI.

Generative AI overview

Search

Discover a world of AI possibilities — built with the power of search.

Search Labs

Search overview

Security

Protect, investigate, and respond to cyber threats with AI-driven security analytics.

Security Labs

Security overview

Observability

Unify app and infrastructure visibility to proactively resolve issues.

Observability Labs

Observability overview

By solution

See how customers search, solve, and succeed — all on one Search AI Platform.

All customer stories

Industries

Exceed customer expectations and go to market faster.

Industries overview

Customer spotlight

Cisco saves 5,000 support engineer hours per month

Sitecore automates 96 percent of security workflows with Elastic

Comcast transforms customer experiences with Elastic Observability

Research

Stay at the forefront of innovation with technical tips from the experts.

Build

Code with other developers to create a better Elastic, together.

Learn

Unleash the possibilities of your data and grow your skill set.

Connect

Keep informed about the latest tech and news from Elastic.

Have questions?

New

The executive guide to generative AI

About us Partners Support|Login

Elasticsearch for Apache Hadoop runtime options

When using elasticsearch-hadoop, it is important to be aware of the following Hadoop configurations that can influence the way Map/Reduce tasks are executed and in return elasticsearch-hadoop.

Important

Unfortunately, these settings need to be setup manually before the job / script configuration. Since elasticsearch-hadoop is called too late in the life-cycle, after the tasks have been already dispatched and as such, cannot influence the execution anymore.

Speculative execution

[TBC: FANCY QUOTE] In other words, speculative execution is an optimization, enabled by default, that allows Hadoop to create duplicates tasks of those which it considers hanged or slowed down. When doing data crunching or reading resources, having duplicate tasks is harmless and means at most a waste of computation resources; however when writing data to an external store, this can cause data corruption through duplicates or unnecessary updates. Since the speculative execution behavior can be triggered by external factors (such as network or CPU load which in turn cause false positive) even in stable environments (virtualized clusters are particularly prone to this) and has a direct impact on data, elasticsearch-hadoop disables this optimization for data safety.

Please check your library setting and disable this feature. If you encounter more data then expected, double and triple check this setting.

Disabling Map/Reduce speculative execution

Speculative execution can be disabled for the map and reduce phase - we recommend disabling in both cases - by setting to false the following two properties:

mapred.map.tasks.speculative.execution mapred.reduce.tasks.speculative.execution

One can either set the properties by name manually on the Configuration/JobConf client:

		jobConf.setSpeculativeExecution(false);
// or
configuration.setBoolean("mapred.map.tasks.speculative.execution", false);
configuration.setBoolean("mapred.reduce.tasks.speculative.execution", false);

	

or by passing them as arguments to the command line:

$ bin/hadoop jar -Dmapred.map.tasks.speculative.execution=false \
                 -Dmapred.reduce.tasks.speculative.execution=false <jar>

Hive speculative execution

Apache Hive has its own setting for speculative execution through namely hive.mapred.reduce.tasks.speculative.execution. It is enabled by default so do change it to false in your scripts:

		set hive.mapred.reduce.tasks.speculative.execution=false;

	

Note that while the setting has been deprecated in Hive 0.10 and one might get a warning, double check that the speculative execution is actually disabled.

Spark speculative execution

Out of the box, Spark has speculative execution disabled. Double check this is the case through the spark.speculation setting (false to disable it, true to enable it).