Create an index (advanced example)

edit

Create an index (advanced example)

edit

This is a more complicated example of creating an index, showing how to define analyzers, tokenizers, filters and index settings. Although essentially the same as the previous example, the more complicated example can be helpful for "real world" usage of the client, since this particular syntax is easy to mess up.

$params = [
    'index' => 'reuters',
    'body' => [
        'settings' => [ 
            'number_of_shards' => 1,
            'number_of_replicas' => 0,
            'analysis' => [ 
                'filter' => [
                    'shingle' => [
                        'type' => 'shingle'
                    ]
                ],
                'char_filter' => [
                    'pre_negs' => [
                        'type' => 'pattern_replace',
                        'pattern' => '(\\w+)\\s+((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\b',
                        'replacement' => '~$1 $2'
                    ],
                    'post_negs' => [
                        'type' => 'pattern_replace',
                        'pattern' => '\\b((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\s+(\\w+)',
                        'replacement' => '$1 ~$2'
                    ]
                ],
                'analyzer' => [
                    'reuters' => [
                        'type' => 'custom',
                        'tokenizer' => 'standard',
                        'filter' => ['lowercase', 'stop', 'kstem']
                    ]
                ]
            ]
        ],
        'mappings' => [ 
            '_default_' => [    
                'properties' => [
                    'title' => [
                        'type' => 'string',
                        'analyzer' => 'reuters',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                    'body' => [
                        'type' => 'string',
                        'analyzer' => 'reuters',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                    'combined' => [
                        'type' => 'string',
                        'analyzer' => 'reuters',
                        'term_vector' => 'yes'
                    ],
                    'topics' => [
                        'type' => 'string',
                        'index' => 'not_analyzed'
                    ],
                    'places' => [
                        'type' => 'string',
                        'index' => 'not_analyzed'
                    ]
                ]
            ],
            'my_type' => [  
                'properties' => [
                    'my_field' => [
                        'type' => 'string'
                    ]
                ]
            ]
        ]
    ]
];
$client->indices()->create($params);

The top level settings contains config about the index (# of shards, etc) as well as analyzers

analysis is nested inside of settings, and contains tokenizers, filters, char filters and analyzers

mappings is another element nested inside of body, and contains the mappings for various types

The _default_ type is a dynamic template that is applied to all fields that don’t have an explicit mapping

The my_type type is an example of a user-defined type that holds a single field, my_field