Indexing documentsedit
When you add documents to Elasticsearch, you index JSON documents. This maps naturally to PHP associative arrays, since they can easily be encoded in JSON. Therefore, in Elasticsearch-PHP you create and pass associative arrays to the client for indexing. There are several methods of ingesting data into Elasticsearch which we cover here.
Single document indexingedit
When indexing a document, you can either provide an ID or let Elasticsearch generate one for you.
Providing an ID value.
$params = [ 'index' => 'my_index', 'id' => 'my_id', 'body' => [ 'testField' => 'abc'] ]; // Document will be indexed to my_index/_doc/my_id $response = $client->index($params);
Omitting an ID value.
$params = [ 'index' => 'my_index', 'body' => [ 'testField' => 'abc'] ]; // Document will be indexed to my_index/_doc/<autogenerated ID> $response = $client->index($params);
If you need to set other parameters, such as a routing
value, you specify
those in the array alongside the index
, and others. For example, let’s set the
routing and timestamp of this new document:
Additional parameters.
$params = [ 'index' => 'my_index', 'id' => 'my_id', 'routing' => 'company_xyz', 'timestamp' => strtotime("-1d"), 'body' => [ 'testField' => 'abc'] ]; $response = $client->index($params);
Bulk Indexingedit
Elasticsearch also supports bulk indexing of documents. The bulk API expects JSON
action/metadata pairs, separated by newlines. When constructing your documents
in PHP, the process is similar. You first create an action array object (for
example, an index
object), then you create a document body object. This
process repeats for all your documents.
A simple example might look like this:
Bulk indexing with PHP arrays.
for($i = 0; $i < 100; $i++) { $params['body'][] = [ 'index' => [ '_index' => 'my_index', ] ]; $params['body'][] = [ 'my_field' => 'my_value', 'second_field' => 'some more values' ]; } $responses = $client->bulk($params);
In practice, you’ll likely have more documents than you want to send in a single bulk request. In that case, you need to batch up the requests and periodically send them:
Bulk indexing with batches.
$params = ['body' => []]; for ($i = 1; $i <= 1234567; $i++) { $params['body'][] = [ 'index' => [ '_index' => 'my_index', '_id' => $i ] ]; $params['body'][] = [ 'my_field' => 'my_value', 'second_field' => 'some more values' ]; // Every 1000 documents stop and send the bulk request if ($i % 1000 == 0) { $responses = $client->bulk($params); // erase the old bulk request $params = ['body' => []]; // unset the bulk response when you are done to save memory unset($responses); } } // Send the last batch if it exists if (!empty($params['body'])) { $responses = $client->bulk($params); }