ES|QL queries to Java objects

ES|QL is a new query language introduced by Elasticsearch that combines a simplified syntax with the pipe operator to enable users to intuitively extrapolate and manipulate data. The new version 8.13.0 of the official Java client introduced support for ES|QL queries, with a new API that allows for easy query execution and automatic translation of the results to java objects.

Prerequisites

  • Elasticsearch version >= 8.11.0
  • Java version >= 17

Ingesting data

Before we start querying we need to have some data available: we're going to store this csv file into Elasticsearch by using the BulkIngester utility class available in the Java client. The csv lists books from the Amazon Books Reviews dataset, categorizing them using the following header row:

Title;Description;Author;Year;Publisher;Ratings

First of all, we have to create the index to map the fields correctly:

if (!client.indices().exists(ex -> ex.index("books")).value()) {
    client.indices().create(c -> c
        .index("books")
        .mappings(mp -> mp
            .properties("title", p -> p.text(t -> t))
            .properties("description", p -> p.text(t -> t))
            .properties("author", p -> p.text(t -> t))
            .properties("year", p -> p.short_(s -> s))
            .properties("publisher", p -> p.text(t -> t))
            .properties("ratings", p -> p.halfFloat(hf -> hf))
        ));
}

Then the Java class for the books:

public record Book(
    String title,
    String description,
    String author,
    Integer year,
    String publisher,
    Float ratings
){}

We're going to use Jackson's CSV mapper to read the file, so let's configure it:

CsvMapper csvMapper = new CsvMapper();
CsvSchema schema = CsvSchema.builder()
    .addColumn("title") // same order as in the csv
    .addColumn("description")
    .addColumn("author")
    .addColumn("year")
    .addColumn("publisher")
    .addColumn("ratings")
    .setColumnSeparator(';')
    .setSkipFirstDataRow(true)
    .build();

MappingIterator<Book> iter = csvMapper
    .readerFor(Book.class)
    .with(schema)
    .readValues(new FileReader("/path/to/file/books.csv"));

Then we'll read the csv file line by line and optimize the ingestion using the BulkIngester:

BulkIngester ingester = BulkIngester.of(bi -> bi
    .client(client)
    .maxConcurrentRequests(20)
    .maxOperations(5000));

boolean hasNext = true;
while (hasNext) {
    try {
        Book book = iter.nextValue();
        ingester.add(BulkOperation.of(b -> b
            .index(i -> i
            .index("books")
            .document(book))));
        hasNext = iter.hasNextValue();
    } catch (JsonParseException | InvalidFormatException e) {
        // ignore malformed data
    }
}

ingester.close();

The indexing will take around 15 seconds, but when it's done we'll have the books index filled with ~80K documents, ready to be queried.

ES|QL

Now it's time to extract some information from the books data. Let's say we want to find the latest reprints of Asimov's works:

String queryAuthor =
    """
    from books
    | where author == "Isaac Asimov"
    | sort year desc
    | limit 10
    """;
List<Book> queryRes = (List<Book>) client.esql()
    .query(ObjectsEsqlAdapter.of(Book.class),queryAuthor);

Thanks to the ObjectsEsqlAdapter using Book.class as the target, we can ignore what the json result of the ES|QL query would be, and just focus on the more familiar list of books that is automatically returned by the client.

For those who are used to SQL queries and the JDBC interface, the client also provides the ResultSetEsqlAdapter, which can be used in the same way and instead returns a java.sql.ResultSet

ResultSet resultSet = esClient.esql()
    .query(ResultSetEsqlAdapter.INSTANCE,queryAuthor);

Another example, we now want to find out the top-rated books from Penguin Books:

String queryPublisher =
    """
    from books
    | where publisher == "Penguin"
    | sort ratings desc
    | limit 10
    | sort title asc
    """;

queryRes = (List<Book>) client.esql()
    .query(ObjectsEsqlAdapter.of(Book.class), queryPublisher);

The Java code to retrieve the data stays the same since the result is again a list of books. There are exceptions of course, for example if a query uses the eval command to add a new column, the Java class should be modified to represent the new result.

The full code for this article can be found in the official client repository. Feel free to reach out on Discuss for any questions or issues.

Ready to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join the Elasticsearch Engineer training starting soon!
Recommended Articles