Indexing documentsedit

When you add documents to Elasticsearch, you index JSON documents. This maps naturally to PHP associative arrays, since they can easily be encoded in JSON. Therefore, in Elasticsearch-PHP you create and pass associative arrays to the client for indexing. There are several methods of ingesting data into Elasticsearch which we cover here.

Single document indexingedit

When indexing a document, you can either provide an ID or let Elasticsearch generate one for you.


Providing an ID value.

$params = [
    'index' => 'my_index',
    'id'    => 'my_id',
    'body'  => [ 'testField' => 'abc']
];

// Document will be indexed to my_index/_doc/my_id
$response = $client->index($params);


Omitting an ID value.

$params = [
    'index' => 'my_index',
    'body'  => [ 'testField' => 'abc']
];

// Document will be indexed to my_index/_doc/<autogenerated ID>
$response = $client->index($params);


If you need to set other parameters, such as a routing value, you specify those in the array alongside the index, and others. For example, let’s set the routing and timestamp of this new document:

Additional parameters.

$params = [
    'index'     => 'my_index',
    'id'        => 'my_id',
    'routing'   => 'company_xyz',
    'timestamp' => strtotime("-1d"),
    'body'      => [ 'testField' => 'abc']
];


$response = $client->index($params);


Bulk Indexingedit

Elasticsearch also supports bulk indexing of documents. The bulk API expects JSON action/metadata pairs, separated by newlines. When constructing your documents in PHP, the process is similar. You first create an action array object (for example, an index object), then you create a document body object. This process repeats for all your documents.

A simple example might look like this:

Bulk indexing with PHP arrays.

for($i = 0; $i < 100; $i++) {
    $params['body'][] = [
        'index' => [
            '_index' => 'my_index',
	    ]
    ];

    $params['body'][] = [
        'my_field'     => 'my_value',
        'second_field' => 'some more values'
    ];
}

$responses = $client->bulk($params);

In practice, you’ll likely have more documents than you want to send in a single bulk request. In that case, you need to batch up the requests and periodically send them:

Bulk indexing with batches.

$params = ['body' => []];

for ($i = 1; $i <= 1234567; $i++) {
    $params['body'][] = [
        'index' => [
            '_index' => 'my_index',
            '_id'    => $i
        ]
    ];

    $params['body'][] = [
        'my_field'     => 'my_value',
        'second_field' => 'some more values'
    ];

    // Every 1000 documents stop and send the bulk request
    if ($i % 1000 == 0) {
        $responses = $client->bulk($params);

        // erase the old bulk request
        $params = ['body' => []];

        // unset the bulk response when you are done to save memory
        unset($responses);
    }
}

// Send the last batch if it exists
if (!empty($params['body'])) {
    $responses = $client->bulk($params);
}