I am excited to announce the release of Elasticsearch for Apache Hadoop (aka ES-Hadoop) 6.2.0 built against Elasticsearch 6.2.0.
Error Handler API Beta
Nothing is worse than starting up a processing job and having it fail 12 hours later because of a bad piece of data. Garbage data happens, and it shouldn’t kill your Hadoop jobs. When ES-Hadoop finds a problem with your data there’s often no way for it to know whether that problem is something it can ignore, or if it’s a serious issue.
Landing in 6.2.0 is beta support for user provided error handlers. Users can now extend a set of public interfaces and register them with ES-Hadoop to handle data-related exceptions from different parts of the connector. Error handlers allow you to inspect a failed operation when it occurs, and take some form of action based on that information.
Specifically, 6.2.0 adds support for specifying error handlers for bulk write failures. Significant portions of the bulk request handling code have been re-written in order to take your user provided feedback into account before retrying or killing your job due to failed write operations. You can read more about supplying your own error handlers for bulk write failures here!
A quick note: This feature is still in beta, and the API’s may be subject to change in the coming releases as we extend these error handlers to other parts of the code base.
Built-In Bulk Write Error Handlers
Easy things should be easy, and that includes handling common errors. With the advent of error handlers for bulk write errors, we’ve provided a few basic handlers to get you started, including one that performs automatic retries to Elasticsearch and one that drops and logs any failures it encounters. You can take a look at the available built in error handlers here.
We’re looking forward to developing a few more default handlers for each type of error we support. If you have any great ideas for built in error handler implementations that we could provide, why don’t you stop by our forums to tell us about them!