10 de enero de 2020

A Rust client for Elasticsearch (alpha)

We're happy to announce an initial alpha release of a new Rust client for Elasticsearch! You can find it on crates.io with the crate name elasticsearch and dive into the documentation to get started.

Why a Rust client?

Rust has been voted the most loved programming language for the last four years on StackOverflow, gaining adoption at many companies such as Microsoft and Dropbox as a fast and memory-safe low-level language alternative for C and C++ systems programming. Indeed, there's a growing fondness of Rust within Elastic, with a Rust Guild and Slack channel to share knowledge, ideas, and projects with other like-minded folks.

The question of having an official Rust client has been raised in the past and there are a couple of popular community clients for working with Elasticsearch. It's fair to say that Rust has been moving fast however, and with the release of async-await to stable Rust in 1.39.0, there seemed like no better time than now to see what an official client could look like.

A spacetime project was started to explore how much effort it would be to build a minimum viable Rust client, as well as become more acquainted with the Rust language and ecosystem. As someone familiar with what it takes to build a client in a statically typed language, I am aware of the challenges in taking on such an effort. Before going into some of the design, however, let's look at a simple example.

Getting started

To demonstrate getting started with the client, we'll run against an Elasticsearch cluster hosted on Elastic Cloud. First, add the elasticsearch crate as a dependency to the Cargo.toml of your project, as well as serde and serde_json, which we'll use for serialization

[dependencies]
elasticsearch = "7.5.1-alpha.1"
serde = "~1"
serde_json = "~1"

After going through the steps in the Elastic Cloud web console and deploying a cluster, get the authentication credentials and the cloud id from the web console and use them to create an instance of the client:

use elasticsearch::{
    auth::Credentials, http::transport::Transport, 
    params::Refresh, Elasticsearch, IndexParts,
    SearchParts,
};
use serde_json::{json, Value};
// change to cloud_id retrieved from cloud web console
let cloud_id = "cluster_name:Y2xvdWQtZW5kcG9pbnQuZXhhbXBsZSQzZGFkZjgyM2YwNTM4ODQ5N2VhNjg0MjM2ZDkxOGExYQ==";
// change to username and password retrieved from cloud web console
let credentials = Credentials::Basic("<username>".into(), "<password>".into());
let transport = Transport::cloud(cloud_id, credentials)?;
let client = Elasticsearch::new(transport);

With the client in place, index some documents with the index API:

let index_response = client
    .index(IndexParts::IndexId("tweets", "1"))
    .body(json!({
        "user": "kimchy",
        "post_date": "2009-11-15T00:00:00Z",
        "message": "Trying out Elasticsearch, so far so good?"
    }))
    .refresh(Refresh::WaitFor)
    .send()
    .await?;
if !index_response.status_code().is_success() {
    panic!("indexing document failed")
}
let index_response = client
    .index(IndexParts::IndexId("tweets", "2"))
    .body(json!({
        "user": "forloop",
        "post_date": "2020-01-08T00:00:00Z",
        "message": "Indexing with the rust client, yeah!"
    }))
    .refresh(Refresh::WaitFor)
    .send()
    .await?;
if !index_response.status_code().is_success() {
    panic!("indexing document failed")
}

serde_json's json! macro passed to the body() function expands JSON into a serde_json::Value, which is useful in representing a document to index into Elasticsearch. The body function can accept any type that implements serde::Serialize trait, however, allowing your own structs to represent documents to be used.

Now, search the tweets index:

let response = client
    .search(SearchParts::Index(&["tweets"]))
    .body(json!({
        "query": {
            "match": {
                "message": "Elasticsearch rust client"
            }
        }
    }))
    .send()
    .await?;

And read values from the response body:

let response_body = response.read_body::<Value>().await?;
for hit in response_body["hits"]["hits"].as_array().unwrap() {
    println!(
        "id: {}, message: '{}', score: {}",
        hit["_id"].as_str().unwrap(),
        hit["_source"]["message"].as_str().unwrap(),
        hit["_score"].as_f64().unwrap()
    );
}

The following will be printed to standard output:

id: 2, message: 'Indexing with the rust client, yeah!', score: 1.4313364
id: 1, message: 'Trying out Elasticsearch, so far so good?', score: 0.6720003

Take a look at the elasticsearch_rust_example GitHub repository for the example in its entirety.

Design

The client is largely generated from the Elasticsearch and X-Pack REST API specs within the Elasticsearch repository, first by reading the JSON specs into structs that model the specs, then by generating streams of Rust syntax tokens from this representation, using the syn and quote crates, and finally, writing the resulting Rust code to source files. There are several advantages in taking such an approach:

With such a large number of APIs, generating as much of the client as possible from an intermediate spec makes development more manageable, reducing the burden of maintenance and potential for mistakes.
It is easy to reason with the resulting code at both development and run time, than, for example, defining well-behaved macros in source code to expand an intermediate representation to executable code.

Heaps of APIs

Elasticsearch has some 280 APIs at present, and counting! An effective client should be structured to make it easy to discover and call them. The client exposes each API as an associated function, either on the root client, Elasticsearch, or on one of the namespaced clients, such as Cat, Indices, Slm, etc. The namespaced clients are based on the grouping of APIs within the REST API specs. All API functions are async only, and can be awaited, taking advantage of the new language features.

Many Elasticsearch APIs have several URL variants. Take the search API as an example; it has the following URL variants, where each {token} URL part value can be supplied by the consumer:

/_search
/{index}/_search
/{index}/{type}/_search

Putting aside the fact that the last variant is deprecated and will be removed in the future since types are going/gone, the client models API URL variants with enums. For example, the above search API variants are represented with the SearchParts enum:

#[derive(Debug, Clone, PartialEq)]
#[doc = "API parts for the Search API"]
pub enum SearchParts<'b> {
    #[doc = "No parts"]
    None,
    #[doc = "Index"]
    Index(&'b [&'b str]),
    #[doc = "Index and Type"]
    IndexType(&'b [&'b str], &'b [&'b str]),
}

Each *Parts enum has an associated function to build a relative URL path from the enum variant. Using such types allows users to be explicit about which variant they want to call, and removes the burden on the consumer of constructing the API path.

Builder patterns

Each API is modeled with a struct, using a consuming builder pattern to set values for the API call. All query string parameters accepted by an API are exposed as associated functions, as is a function for the request body, if the API accepts one. A send() async associated function acts a terminal function for the builder pattern, consuming self and creating a Future that can be awaited.

Most Elasticsearch APIs accept a request body in the form of JSON, but there are a few that accept a request body in the form of newline-delimited JSON (NDJSON). A Body trait models the body of an API call, with implementations to handle both JSON and NDJSON. For APIs that expect JSON, the body() associated function of the API constrains the input to a type that implements serde::Serialize trait, and for APIs that expect NDJSON, the body associated function constrains the input to a vector of types that implement the Body trait. These function signatures should serve to guide consumers to the correct body form, while being open to extension in the future. The Body trait abstraction also allows experimental or beta Elasticsearch APIs, which aren't modeled as structs in the client, to be called by the client. For example, the Create transform API, a beta API in Elasticsearch 7.5.1, can be called with:

let response = client
    .send(Method::Put,
        "_transform/ecommerce_transform",
        HeaderMap::new(),
        Option::<&Value>::None,
        Some(JsonBody::new(json!({
            "source": {
            "index": "kibana_sample_data_ecommerce",
            "query": {
                "term": {
                "geoip.continent_name": {
                    "value": "Asia"
                }
                }
            }
            },
            "pivot": {
            "group_by": {
                "customer_id": {
                "terms": {
                    "field": "customer_id"
                }
                }
            },
            "aggregations": {
                "max_price": {
                "max": {
                    "field": "taxful_total_price"
                }
                }
            }
            },
            "description": "Maximum priced ecommerce data by customer_id in Asia",
            "dest": {
            "index": "kibana_sample_data_ecommerce_transform",
            "pipeline": "add_timestamp_pipeline"
            },
            "frequency": "5m",
            "sync": {
            "time": {
                "field": "order_date",
                "delay": "60s"
            }
            }
        })))
    )
    .await?;
let status_code = response.status_code();
let response_body = response.read_body::<Value>().await?;
let successful = status_code.is_success() &&
    response_body["acknowledged"].as_bool().unwrap();

Future roadmap

The immediate next steps on the roadmap are towards a general availability (GA) release, ensuring that the API is easy to use, idiomatic, and importantly, fits the needs of the Rust community wanting to interact with Elasticsearch. These steps include ensuring that the behavior of the client is functionally aligned with the other official clients, for example, controlling when failed API calls can be retried, how nodes are selected to perform API calls against, etc.

One important item on the future roadmap for a statically typed language client like Rust is to expose requests and responses in a strongly typed fashion. This is something that we are looking into, to try to find an approach that reconciles maintaining such a large API with some of the more dynamic structural and behavioral elements of those APIs.

Thank you

A special thank you to Ashley Mannix and William Myers for taking the time to chat about their experiences working with the community Rust clients and Elasticsearch. In particular, to Ashley, for the many conversations and inspiration!

Feedback

This initial alpha release is just the beginning. Please try the client out; we would love to hear your thoughts and feedback on it, on a GitHub issue and beyond!

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

Por industria

Por solución

Cliente destacado

Desarrolladores

Conéctate

Conoce

Ayuda

A Rust client for Elasticsearch (alpha)

Why a Rust client?

Getting started

Design

Heaps of APIs

Builder patterns

Future roadmap

Thank you

Feedback

Síguenos

Conócenos

Únete a nosotros

Prensa

Socios

Confianza y seguridad

Relaciones con inversionistas

EXCELLENCE AWARDS