Connection Pool

edit

The connection pool is an object inside the client that is responsible for maintaining the current list of nodes. Theoretically, nodes are either dead or alive. However, in the real world, things are never so clear. Nodes are sometimes in a gray-zone of "probably dead but not confirmed", "timed-out but unclear why" or "recently dead but now alive". The job of the connection pool is to manage this set of unruly connections and try to provide the best behavior to the client.

If a connection pool is unable to find an alive node to query against, it returns a NoNodesAvailableException. This is distinct from an exception due to maximum retries. For example, your cluster may have 10 nodes. You execute a request and 9 out of the 10 nodes fail due to connection timeouts. The tenth node succeeds and the query executes. The first nine nodes are marked dead (depending on the connection pool being used) and their "dead" timers begin ticking.

When the next request is sent to the client, nodes 1-9 are still considered "dead", so they are skipped. The request is sent to the only known alive node (#10), if this node fails, a NoNodesAvailableException is returned. You will note this much less than the retries value, because retries only applies to retries against alive nodes. In this case, only one node is known to be alive, so NoNodesAvailableException is returned.

There are several connection pool implementations that you can choose from:

staticNoPingConnectionPool (default)

edit

This connection pool maintains a static list of hosts which are assumed to be alive when the client initializes. If a node fails a request, it is marked as dead for 60 seconds and the next node is tried. After 60 seconds, the node is revived and put back into rotation. Each additional failed request causes the dead timeout to increase exponentially.

A successful request resets the "failed ping timeout" counter.

If you wish to explicitly set the StaticNoPingConnectionPool implementation, you may do so with the setConnectionPool() method of the ClientBuilder object:

$client = ClientBuilder::create()
            ->setConnectionPool('\Elasticsearch\ConnectionPool\StaticNoPingConnectionPool', [])
            ->build();

Note that the implementation is specified via a namespace path to the class.

staticConnectionPool

edit

Identical to the StaticNoPingConnectionPool, except it pings nodes before they are used to determine if they are alive. This may be useful for long-running scripts but tends to be additional overhead that is unnecessary for average PHP scripts.

To use the StaticConnectionPool:

$client = ClientBuilder::create()
            ->setConnectionPool('\Elasticsearch\ConnectionPool\StaticConnectionPool', [])
            ->build();

Note that the implementation is specified via a namespace path to the class.

simpleConnectionPool

edit

The SimpleConnectionPool returns the next node as specified by the selector; it does not track node conditions. It returns nodes either they are dead or alive. It is a simple pool of static hosts.

The SimpleConnectionPool is recommended where the Elasticsearch deployment is located behnd a (reverse-) proxy or load balancer, where the individual Elasticsearch nodes are not visible to the client. This should be used when running Elasticsearch deployments on Cloud.

To use the SimpleConnectionPool:

$client = ClientBuilder::create()
            ->setConnectionPool('\Elasticsearch\ConnectionPool\SimpleConnectionPool', [])
            ->build();

Note that the implementation is specified via a namespace path to the class.

sniffingConnectionPool

edit

Unlike the two previous static connection pools, this one is dynamic. The user provides a seed list of hosts, which the client uses to "sniff" and discover the rest of the cluster by using the Cluster State API. As new nodes are added or removed from the cluster, the client updates its pool of active connections.

To use the SniffingConnectionPool:

$client = ClientBuilder::create()
            ->setConnectionPool('\Elasticsearch\ConnectionPool\SniffingConnectionPool', [])
            ->build();

Note that the implementation is specified via a namespace path to the class.

Custom Connection Pool

edit

If you wish to implement your own custom Connection Pool, your class must implement ConnectionPoolInterface:

class MyCustomConnectionPool implements ConnectionPoolInterface
{

    /**
     * @param bool $force
     *
     * @return ConnectionInterface
     */
    public function nextConnection($force = false)
    {
        // code here
    }

    /**
     * @return void
     */
    public function scheduleCheck()
    {
        // code here
    }
}

You can then instantiate an instance of your ConnectionPool and inject it into the ClientBuilder:

$myConnectionPool = new MyCustomConnectionPool();

$client = ClientBuilder::create()
            ->setConnectionPool($myConnectionPool, [])
            ->build();

If your connection pool only makes minor changes, you may consider extending AbstractConnectionPool which provides some helper concrete methods. If you choose to go down this route, you need to make sure your ConnectionPool implementation has a compatible constructor (since it is not defined in the interface):

class MyCustomConnectionPool extends AbstractConnectionPool implements ConnectionPoolInterface
{

    public function __construct($connections, SelectorInterface $selector, ConnectionFactory $factory, $connectionPoolParams)
    {
        parent::__construct($connections, $selector, $factory, $connectionPoolParams);
    }

    /**
     * @param bool $force
     *
     * @return ConnectionInterface
     */
    public function nextConnection($force = false)
    {
        // code here
    }

    /**
     * @return void
     */
    public function scheduleCheck()
    {
        // code here
    }
}

If your constructor matches AbstractConnectionPool, you may use either object injection or namespace instantiation:

$myConnectionPool = new MyCustomConnectionPool();

$client = ClientBuilder::create()
            ->setConnectionPool($myConnectionPool, [])                                      // object injection
            ->setConnectionPool('/MyProject/ConnectionPools/MyCustomConnectionPool', [])    // or namespace
            ->build();

Which connection pool to choose? PHP and connection pooling

edit

At first glance, the sniffingConnectionPool implementation seems superior. For many languages, it is. In PHP, the conversation is a bit more nuanced.

Because PHP is a share-nothing architecture, there is no way to maintain a connection pool across script instances. This means that every script is responsible for creating, maintaining, and destroying connections everytime the script is re-run.

Sniffing is a relatively lightweight operation (one API call to /_cluster/state, followed by pings to each node) but it may be a non-negligible overhead for certain PHP applications. The average PHP script likely loads the client, executes a few queries and then closes. Imagine that this script being called 1000 times per second: the sniffing connection pool performS the sniffing and pinging process 1000 times per second. The sniffing process eventually adds a large amount of overhead.

In reality, if your script only executes a few queries, the sniffing concept is too robust. It tends to be more useful in long-lived processes which potentially "out-live" a static list.

For this reason the default connection pool is currently the staticNoPingConnectionPool. You can, of course, change this default - but we strongly recommend you to perform load test and to verify that the change does not negatively impact the performance.

Quick setup

edit

As you see above, there are several connection pool implementations available, and each has slightly different behavior (pinging vs no pinging, and so on). Connection pools are configured via the setConnectionPool() method:

$connectionPool = '\Elasticsearch\ConnectionPool\StaticNoPingConnectionPool';
$client = ClientBuilder::create()
            ->setConnectionPool($connectionPool)
            ->build();