IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Hello, world Querying or accessing metadata »

› › ›

Usage

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Usage

edit

Using the attachment type is simple, in your mapping JSON, simply set a certain JSON element as attachment, for example:

PUT /test
{
  "mappings": {
    "person" : {
      "properties" : {
        "my_attachment" : { "type" : "attachment" }
      }
    }
  }
}

In this case, the JSON to index can be:

PUT /test/person/1
{
    "my_attachment" : "... base64 encoded attachment ..."
}

Or it is possible to use more elaborated JSON if content type, resource name or language need to be set explicitly:

PUT /test/person/1
{
    "my_attachment" : {
        "_content_type" : "application/pdf",
        "_name" : "resource/name/of/my.pdf",
        "_language" : "en",
        "_content" : "... base64 encoded attachment ..."
    }
}

The attachment type not only indexes the content of the doc in content sub field, but also automatically adds meta data on the attachment as well (when available).

The metadata supported are:

date
title
name only available if you set _name see above
author
keywords
content_type
content_length is the original content_length before text extraction (aka file size)
language

They can be queried using the "dot notation", for example: my_attachment.author.

Both the meta data and the actual content are simple core type mappers (text, date, …), thus, they can be controlled in the mappings. For example:

PUT /test
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["standard"]
          }
        }
      }
    }
  },
  "mappings": {
    "person" : {
      "properties" : {
        "file" : {
          "type" : "attachment",
          "fields" : {
            "content" : {"index" : true},
            "title" : {"store" : true},
            "date" : {"store" : true},
            "author" : {"analyzer" : "my_analyzer"},
            "keywords" : {"store" : true},
            "content_type" : {"store" : true},
            "content_length" : {"store" : true},
            "language" : {"store" : true}
          }
        }
      }
    }
  }
}

In the above example, the actual content indexed is mapped under fields name content, and we decide not to index it, so it will only be available in the _all field. The other fields map to their respective metadata names, but there is no need to specify the type (like text or date) since it is already known.

« Hello, world Querying or accessing metadata »