Indexing Documentsedit

When you add documents to Elasticsearch, you index JSON documents. This maps naturally to PHP associative arrays, since they can easily be encoded in JSON. Therefore, in Elasticsearch-PHP you create and pass associative arrays to the client for indexing. There are several methods of ingesting data into Elasticsearch, which we will cover here.

Single document indexingedit

When indexing a document, you can either provide an ID or let elasticsearch generate one for you.

Providing an ID value. 

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'id' => 'my_id',
    'body' => [ 'testField' => 'abc']
];

// Document will be indexed to my_index/my_type/my_id
$response = $client->index($params);

Omitting an ID value. 

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [ 'testField' => 'abc']
];

// Document will be indexed to my_index/my_type/<autogenerated ID>
$response = $client->index($params);

If you need to set other parameters, such as a routing value, you specify those in the array alongside the index, type, etc. For example, let’s set the routing and timestamp of this new document:

Additional parameters. 

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'id' => 'my_id',
    'routing' => 'company_xyz',
    'timestamp' => strtotime("-1d"),
    'body' => [ 'testField' => 'abc']
];


$response = $client->index($params);

Bulk Indexingedit

Elasticsearch also supports bulk indexing of documents. The bulk API expects JSON action/metadata pairs, separated by newlines. When constructing your documents in PHP, the process is similar. You first create an action array object (e.g. index object), then you create a document body object. This process repeats for all your documents.

A simple example might look like this:

Bulk indexing with PHP arrays. 

for($i = 0; $i < 100; $i++) {
    $params['body'][] = [
        'index' => [
            '_index' => 'my_index',
            '_type' => 'my_type',
        ]
    ];

    $params['body'][] = [
        'my_field' => 'my_value',
        'second_field' => 'some more values'
    ];
}

$responses = $client->bulk($params);

In practice, you’ll likely have more documents than you want to send in a single bulk request. In that case, you need to batch up the requests and periodically send them:

Bulk indexing with batches. 

$params = ['body' => []];

for ($i = 1; $i <= 1234567; $i++) {
    $params['body'][] = [
        'index' => [
            '_index' => 'my_index',
            '_type' => 'my_type',
            '_id' => $i
        ]
    ];

    $params['body'][] = [
        'my_field' => 'my_value',
        'second_field' => 'some more values'
    ];

    // Every 1000 documents stop and send the bulk request
    if ($i % 1000 == 0) {
        $responses = $client->bulk($params);

        // erase the old bulk request
        $params = ['body' => []];

        // unset the bulk response when you are done to save memory
        unset($responses);
    }
}

// Send the last batch if it exists
if (!empty($params['body'])) {
    $responses = $client->bulk($params);
}