Introducing the new PHP client for Elasticsearch 8

library-branding-elastic-enterprise-search-midnight-1680x980-no-logo.png

The new PHP client for Elasticsearch 8 has been rewritten from scratch. Along with adopting the PSR standards, we’ve also redesigned the architecture and moved the HTTP transport layer outside. A pluggable system is also now available, thanks to the HTTPlug library.

Read on to explore:

  • The new architecture and features of the PHP client
  • How to consume the endpoints and manage errors using thePSR-7 standard for HTTP messages
  • How to interact with Elasticsearch using an asynchronous approach

The old client

The elasticsearch-php library is the official client for programming Elasticsearch with PHP. This library exposes all the 400+ endpoints of Elasticsearch using a main Client class. In version 7 of this library, all the endpoints are exposed using functions — for example, the index API is mapped into the method Client::index().

These functions return an associative array that is the deserialization of the HTTP response from Elasticsearch. Usually, this response is represented by a JSON message. This message is converted into an array using the json_decode() function of PHP.

In case of errors, the client throws an exception that depends on the issue. For instance, if the HTTP response is 404, the client throws a Missing404Exception. If you want to retrieve the HTTP response itself, you need to get the last response from the client using the following code:

$response = $client->info();
$last = $client->transport->getLastConnection()->getLastRequestInfo();
 
$request = $last['request']; // associative array of the HTTP request
var_dump($request);
 
$response = $last['response']; // associative array of the HTTP response
echo $response['status']; // 200
echo $response['body'];   // the body as string

The HTTP request and response are retrieved from the transport layer (a property of the Client) with a couple of methods: getLastConnection() and getLastRequestInfo().

This code does not offer a good developer experience because the keys of the associative array $response are quite a lot, coming from the usage of cURL extensions of PHP.

The new client

We built the new elasticsearch-php 8 from scratch for many reasons: developer experience, new PHP standards, a more open architecture, and performance.

With about 70 million installations, we did not want to have many BC breaks for version 8. We used a backward compatibility approach offering the same APIs of version 7. That means you can connect to Elasticsearch using the same code and execute an endpoint call as usual. The difference is in the response. In version 8, the response is an object of Elasticsearch response that implements the PSR-7 response interface and the ArrayAccess interface of PHP.

Wait a minute — isn’t that a big BC break? Fortunately, we implemented the ArrayAccess interface and you can continue to consume the response as an array, as follows:

Spot the differences: the namespace has been changed! We introduced the Elastic root namespace. The other code looks the same, but there is a big change under the hood.

As we mentioned, $response in version 8 is an object, while in version 7 this is an associative array. If you want the exact same behavior of version 7, you can serialize the response as an array using the function $response->asArray().

We also offer asObject(), asString(), and asBool() functions to serialize the body as an object of the standard class of PHP (stdClass), as string or boolean (true if 2xx response, false otherwise).

For example, you can consume the previous info() endpoint as follows:

$client = ClientBuilder::create()
   ->setHosts(['localhost:9200'])
   ->build();
$response = $client->info();
 
echo $response['version']['number'];
echo $response->version->number; // 8.0.0
 
var_dump($response->asObject()); // response body content as stdClass object
var_dump($response->asString()); // response body as string (JSON)
var_dump($response->asBool());   // true if HTTP response code 2xx

The $response is able to access the response body as an object implementing the _get() magic method of PHP.

If you want to read the HTTP response, you don’t need to recover the last message from the Client object; you can just access the PSR-7 message in the $response itself, as follows:

echo $response->getStatusCode();    // 200, since $response is PSR-7
echo (string) $response->getBody(); // Response body in JSON

This is a big advantage, especially if you are working with asynchronous. In fact, you cannot retrieve the last response from the client if you are using asynchronous programming. The last response is not guaranteed to be the one that you are looking for (More to come on asynchronous operations later in this article).

Endpoint parameters for autocompletion

We added an autocompletion capability in elasticsearch-php version 8 using the Object-like arrays of Psalm project. Psalm is a static analysis tool that allows developers to decorate the code using the special phpDoc attribute. One of these attributes is @psalm-type, which makes it possible to specify the key types of an associative array. We applied the Psalm type using the standard phpDoc @param. Each PHP client endpoint has an input parameter that is the $params array. For instance, reported here is the index() endpoint section:

/**
    * Creates or updates a document in an index.
    *
    * @see https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-index_.html
    *
    * @param array{
    *     id: string, //  Document ID
    *     index: string, // (REQUIRED) The name of the index
    *     wait_for_active_shards: string, // Sets the number of shard copies …
    *     op_type: enum, // Explicit operation type. Defaults to `index` for requests…
    *     refresh: enum, // If `true` then refresh the affected shards to make this operation…
    *     routing: string, // Specific routing value
    *     timeout: time, // Explicit operation timeout
    *     version: number, // Explicit version number for concurrency control
    *     version_type: enum, // Specific version type
    *     if_seq_no: number, // only perform the index operation if the last operation…
    *     if_primary_term: number, // only perform the index operation if the last operation…
    *     pipeline: string, // The pipeline id to preprocess incoming documents with
    *     require_alias: boolean, // When true, requires destination to be an alias…
    *     pretty: boolean, // Pretty format the returned JSON response. (DEFAULT: false)
    *     human: boolean, // Return human readable values for statistics. (DEFAULT: true)
    *     error_trace: boolean, // Include the stack trace of returned errors. (DEFAULT: false)
    *     source: string, // The URL-encoded request definition. Useful for libraries…
    *     filter_path: list, // A comma-separated list of filters used to reduce the response.
    *     body: array, // (REQUIRED) The document
    * } $params
    */
   public function index(array $params = [])

All the parameters are specified with a name (index), including the type (string) and a comment that describes the parameter (the name of the index). The required parameters are specified using a REQUIRED note.

You can have autocompletion in your IDE using the previous notation. For instance, using PhpStorm, you can install the deep-assoc-completionfree plugin to enable the PHP associative array auto-completion with @psalm-type attribute.

Video thumbnail

The deep-assoc-completion is also available for Visual Studio Code even if this version is still under development.

Pluggable architecture

Another change we made in version 8 was the split of the HTTP transport layer from the library. We created the elastic-transport-php library that is a PSR-18 client for connecting to Elastic products in PHP. This library is consumed not only by elasticsearch-php but also from other projects like enterprise-search-php.

This library is based on a pluggable architecture, which means you can configure it to use a specific implementation of the following interfaces:

The elasticsearch-php version 8 uses the elastic-transport-php as dependency. This means you can connect to Elasticsearch using a custom HTTP library, a custom Node Pool, or a custom logger.

We used the HTTPlug library to perform an autodiscovery of the PSR-18 and PSR-7 library available in a PHP application. By default, if the application does not have an HTTP library installed, we use Guzzle.

For instance, you can use the Symfony HTTP client as follows:

use Symfony\Component\HttpClient\Psr18Client;
 
$client = ClientBuilder::create()
   ->setHttpClient(new Psr18Client)
   ->build();

Alternatively, you can use the Monolog logger library as follows:

use Monolog\Logger;
use Monolog\Handler\StreamHandler;
 
$logger = new Logger('name');
$logger->pushHandler(new StreamHandler('path/to/your.log', Logger::WARNING));
$client = ClientBuilder::create()
   ->setLogger($logger)
   ->build();

For more information on how to customize the Client, check out the Configuration page.

Connect to Elastic Cloud

Elastic Cloud is the PaaS solution offered by Elastic. To connect to Elastic Cloud, you just need the Cloud ID and the API key.

The Cloud ID can be retrieved in the My deployment page of your Elastic Cloud dashboard. The API key can be generated from the Management section in the Security page settings.

You can read the Connecting section of the PHP client documentation for more information.

Once you have collected the Cloud ID and the API key, you can use elasticsearch-php to connect to your Elastic Cloud instance, as follows:

$client = ClientBuilder::create()
   ->setElasticCloudId('insert here the Cloud ID')
   ->setApiKey('insert here the API key')
   ->build();

Security by default

If you installed Elasticsearch 8 in your infrastructure, you can use the PHP client with TLS (transport layer security) enabled. Elasticsearch 8 offers security by default, which means it uses TLS to protect the communication between client and server.

In order to configure elasticsearch-php to connect to Elasticsearch 8, you need to have the certificate authority file (CA).

You can install Elasticsearch in different ways. If you use Docker, you need to execute the following command:

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.1.0

Once you have the Docker image installed, you can execute Elasticsearch using a single-node cluster configuration, as follows:

docker network create elastic
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.1.0

This command creates an elastic Docker network and starts Elasticsearch using the port 9200 (default).

When you run the Docker image, a password is generated for the Elastic user and it's printed to the terminal (you might need to scroll back a bit in the terminal to view it). You have to copy it to connect to Elasticsearch.

Now that Elasticsearch is running, we can get the http_ca.crt file certificate. Copy it from the Docker instance using the following command:

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

Once we have the http_ca.crt certificate and the password, copied during the start of Elasticsearch, we can use it to connect as follows:

$client = ClientBuilder::create()
   ->setHosts(['https://localhost:9200'])
   ->setBasicAuthentication('elastic', 'password copied during ES start')
   ->setCABundle('path/to/http_ca.crt')
   ->build();

Use the client in asynchronous mode

The PHP client offered the possibility to execute asynchronous calls for each endpoint. With version 7, you needed to specify a special future => lazy value in the client key passed as parameter for the endpoint, as follows:

$params = [
   'index' => 'my-index',
   'client' => [
      'future' => 'lazy'
   ],
   'body' => [
       'foo' => 'bar'
   ]
];
$response = $client->index($params);

The previous example indexes the { "foo": "bar" } document in Elasticsearch using an asynchronous HTTP call. The $response is a future, rather than the actual response.

A future represents a future computation and acts like a placeholder. You can pass a future around your code like a regular object. When you need the result values, you can resolve the future. If the future has already been resolved (due to some other activity), the values are immediately available. If the future has not been resolved yet, the resolution blocks until those values become available (for example, after the API call completes).

In version 7, the future is actually a Promise of the RingPHP project. In version 8, if you want to use asynchronous, you need to install the specific adapter for your HTTP client. For instance, if you are using Guzzle 7 (the default HTTP library for elasticsearch-php), you need to install the php-http/guzzle7-adapter as follows:

composer require php-http/guzzle7-adapter

To execute an endpoint using an asynchronous call, you need to enable it using the Client::setAsync(true) function, as follows:

$client->setAsync(true);
$params = [
   'index' => 'my-index',
   'body' => [
       'foo' => 'bar'
   ]
];
$response = $client->index($params);

If you want to disable async for the next endpoint, you need to setAsync to false again.

The response of an asynchronous call is a Promise object of the HTTPlug library. This Promise follows the Promises/A+ standard. A promise represents the eventual result of an asynchronous operation.

To get the response, you need to wait for that response to arrive. This will block the execution waiting for the response as follows:

$response = $client->index($params);
$response = $response->wait();
printf("Body response:\n%s\n", $response->asString());

The primary way of interacting with a promise is through its then method, which registers callbacks to receive either a promise’s eventual value or the reason why the promise cannot be fulfilled.

$response = $client->index($params);
$response->then(
   // The success callback
   function (ResponseInterface $response) {
       // $response is Elastic\Elasticsearch\Response\Elasticsearch
       printf("Body response:\n%s\n", $response->asString());
   },
   // The failure callback
   function (\Exception $exception) {
       echo 'Houston, we have a problem';
       throw $exception;
   }
);
$response = $response->wait();

The last $response->wait() is needed to resolve the execution call in the example above.

Less code and memory usage

The new PHP client for Elasticsearch uses less code compared with version 7. In particular, the elasticsearch-php version 8 is composed of 6,522 lines of code + 1,025 lines of code of elastic-transport-php for a total of 7,547 lines. In version 7, we had 20,715 lines of code, so the new version 8 is about one-third the size of the previous.

Regarding the memory usage, elasticsearch-php version 8 implements a lazy loading mechanism to optimize the API namespace loading. This means if you are using only a subset of all the 400+ endpoints, you will not load in memory all the specifications.

Wrapping up

Elasticsearch 8 comes with some exciting improvements. From the new architecture and the ability to preserve backward compatibility with version 7, to the default security settings and advantages of asynchronous mode, the possibilities have never been more endless.

The best way to get started is with Elastic Cloud. Begin your free trial of Elastic Cloud today!