August 26, 2016

Serverless Elasticsearch Curator on AWS Lambda

In this post, we demonstrate how Elastic's Infrastructure team runs Elasticsearch Curator as a serverless application on AWS Lambda. We share the rationale, the tools, and of course, the code.

Servers, Services, and Serverless

As the team responsible for managing Elastic's internal systems, we want to build great systems without increasing the burden of system management. If we deploy a server instance to manage our existing infrastructure, we are creating overhead, which ultimately makes us less effective. To keep our overhead low and our effectiveness high, we often eschew servers for services and more recently, serverless computing.

Elastic Cloud

We use Elasticsearch to store and analyse all sorts of things. We keep system metrics from Metricbeat, logging from Puppet, even a running history of our GitHub issues to track our performance as a service team.

We use Elastic Cloud for our Elasticsearch clusters. The Cloud service gives us Elasticsearch, without expanding our server footprint, and thus our management overhead.

Serverless

We also run a selection of tasks as "serverless" processes on Lambda. This article focuses on our serverless approach to running Curator. With all the time-series and log data we collect, we certainly have a need for Curator, but it would be a shame to run servers just to host it.

Lambkin

Getting a function into Lambda in a repeatable, automated way is a reasonably complex task. We use our open-source tool Lambkin to reduce that complexity. Lambkin creates skeleton functions on demand and helps us publish, run, and schedule Lambda functions.

The Procedure

Here, we will step through a process for setting up a serverless Elasticsearch Curator system identical to the one we use internally. The example is implemented in Python, the language used for both Curator and Lambkin. To follow along, you'll need a Python environment with the "pip" command available. Deep knowledge of Python is not required, however. The solution provided will run as-is, and is configurable for your environment by editing a simple YAML file.

Let's go:

Install Lambkin and virtualenv

sudo pip install lambkin virtualenv

Configure AWS credentials and default region

If you don't already have your AWS account configured, a simple way to do it is:

sudo pip install awscli
aws configure

More detail is available in the AWS CLI documentation.

Create a new skeleton Lambda function

lambkin create serverless-curator
cd serverless-curator

In the new serverless-curator directory, you'll find some skeleton files. These files make up a valid, ready-to-run Lambda function in Python, and a context for managing any dependencies it might have.

Create the Python function

The primary file is serverless-curator.py. It contains the body of our Python function. Feel free to examine it if you're interested.

The function doesn't do much yet, but it's already possible to publish and run it on Lambda:

lambkin publish --description='Just a test.'
lambkin run

You should then see some output from Lambda, ending with a JSON object returned by the sample function. Like this:

{"from": "Python", "hello": "World"}

Replace the skeleton function

It's time to replace the example function with something more useful. Full source code for a working Curator function is provided in this Gist.

The function makes use of a YAML configuration file where you can declare index patterns across multiple Elasticsearch clusters. Be sure to create the file serverless-curator.yaml. A configuration example is also provided in the Gist.

Install Python requirements

Our new function requires some library packages from the Python Package Index, so we need to ensure they will be available in Lambda. Edit the requirements.txt file, changing its contents to:

certifi==2016.8.8
elasticsearch-curator==4.0.6
PyYAML==3.11

Then install the packages:

lambkin build

The function's requirements are now installed. Lambkin uses virtualenv to ensure that each function gets its own isolated dependencies.

Publish and try the new version

We'll also update the description, and set a long timeout in case we will be processing a lot of indices:

lambkin publish --description='Elasticsearch Curator' --timeout=300
lambkin run

The function will return (in JSON format), a dictionary showing which indices were deleted (if any).

Schedule the function

If you're happy with the results, Lambkin can arrange to have the function run on a regular schedule:

lambkin schedule --rate '1 hour'

At this point, you have a reliable, regularly scheduled Curator job running in a serverless environment.

Now would be a great time to check your new function into version control. When you want to change the configuration, perhaps to accommodate new indices, just edit the YAML config, and do a "lambkin publish".

Wrap Up

If you'd like to remove the function from Lambda, try these commands:

lambkin list-published
lambkin unpublish --help

If you'd like to explore Lambkin further, try "lambkin --help", or come and join the conversation on GitHub.

컨텍스트 엔지니어링

벡터 데이터베이스

Search AI 기반 애플리케이션

로그

위협 보호

워크플로우

Elasticsearch

Kibana(Discover, 대시보드)

Elastic Agent Builder

AutoOps

파이프 쿼리 언어

Jina AI 검색 모델

Elastic Cloud Serverless

Elastic Cloud Hosted

자체 관리형 Elasticsearch

전자 상거래 검색

고객 지원 검색

검색 기반 앱

로그 분석

인프라 모니터링

디지털 경험 모니터링

앱 성능 모니터링

AIOps

LLM 통합 가시성

차세대 SIEM

보안 워크플로우

XDR 및 엔드포인트 보안

보안을 위한 AI

데이터 가치 10배 향상

클라우드 서비스 제공자

Elastic AI 에코시스템

Search AI 파트너 프로그램

AV-Comparatives

Forrester Wave™ 리더

Gartner Magic Quadrant 리더

IDC MarketScape 리더

검색

보안

통합 가시성

시작하기

데모 갤러리

다운로드

통합

설명서

Elastic Search Labs

Elastic Security Labs

Elastic Observability Labs

블로그

커뮤니티

이벤트

웨비나

토론

교육

지원

컨설팅