10 January 2018

Ask Me Anything Insights from Elastic{ON} Tour Seoul

樂 "I've heard the mapping type is gone in 6.0. How can I create parent/child relations in 6.0?" 邏 "You can use the join datatype to create parent/child relations." 樂 "6.0 does not separate parent and child by mapping types. How do I know a document is a parent or a child?"

It was a very interesting experience to be at the Ask Me Anything (AMA) booth in Elastic{ON} Tour Seoul. Those were the questions, or some of them, that we got at the AMA. I've got many questions probably interesting to many users, including the one about the mapping type removal. The idea of this blog post is to go around some of the frequently asked questions we got during the tour.

kiju Kim. Elastic Support Engineer ( AMA booth at Elastic{ON} Tour Seoul. This picture is not directly related with the questions in this blog post. 邏 )

Removal of Mapping Types: Parent/Child relationships

Elasticsearch 6.0 limits indices to have only one mapping type (https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html). Indices created in 5.x with multiple mapping types will continue to function even when upgraded to 6.x. You can refer to the following video clip about the rolling upgrade from 5.x to 6.x:

Parent/Child relationships, prior to this change, required two different types in order to create the relation: a Parent and a Child type within the same index. Since types where removed, in Elasticsearch 6.x to create a parent/child relationship you should use a new field type: the join datatype (https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html). The following is an example of this:

PUT my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "my_join_field": { 
          "type": "join",
          "relations": {
            "question": "answer" 
          }
        }
      }
    }
  }
}

In this example, there is a relation between question and answer. Here, question is parent of answer. myjoinfield is the name of the field that defines the relation. You can index a parent document (question) with the relation name and the parent name as below:

PUT my_index/doc/1?refresh
{
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

You can also take a shortcut as below:

PUT my_index/doc/1?refresh
{
  "text": "This is a question",
  "my_join_field": "question" 
}

For a child document, you can specify the relation name, the child name, and the document ID of the parent document (i.e. 1). You must specify the routing to store the parent and the child document in the same shard.

PUT my_index/doc/3?routing=1&refresh 
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

Now you can query all the document with "GET my*index/*search" to get the following result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "This is a question",
          "my_join_field": {
            "name": "question"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "doc",
        "_id": "3",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      }
    ]
  }
}

Where you can see the child document has the parent ID ("1") and the child name answer as the name of the join. You can query parents by running the following query:

GET my_index/_search
{
  "query": {
    "match": {
      "my_join_field": "question"
    }
  }
}

You can also query children by

GET my_index/_search
{
  "query": {
    "match": {
      "my_join_field": "answer"
    }
  }
}

For more information around this, you can read the related blog posts: https://www.elastic.co/blog/removal-of-mapping-types-elasticsearch https://www.elastic.co/blog/kibana-6-removal-of-mapping-types

Rejecting mapping update to…

Related with the removal of mapping types in 6.0, you may get the following errors when you ingest using Beats or Logstash:

Beats example
WARN Can not index event (status=400): {"type":"illegal_argument_exception","reason":"Rejecting mapping update to [index-2017.12.08] as the final mapping would have more than 1 type: [doc, type_01]"}
Logstash example
[2017-12-13T18:17:39,184][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"test_01", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x38074789], :response=>{"index"=>{"_index"=>"test_01", "_type"=>"doc", "_id"=>"yt8qT2ABg3XDnkh4Rd8-", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Rejecting mapping update to [test_01] as the final mapping would have more than 1 type: [type_01, doc]"}}}}

This is because they assume the mapping type is doc but the index already has a mapping type of type01*. Elasticsearch 6.x doesn't allow multiple mapping types and rejects the document. It will automatically try to merge the mapping types, and fail. You can either use doc or let Logstash use type*01 as the mapping type (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_type) to fix it.

Split Brain (Quorum) Problem

There have been some questions like: "We just started using Elasticsearch. Will you review if our cluster configuration is appropriate, please? We have 2 master nodes …"

While you might think that you need 2 master nodes for High Availability (HA), it's actually required to have at least 3 master eligible nodes to provide HA while avoiding a split brain situation. You can have a split brain when you have two master-eligible nodes and the network between them is disconnected. Each master-eligible node elects itself as the new master and forms two independent clusters. If each cluster is updated differently, you cannot recover without data loss even after the network connection is recovered.

To avoid split brain, you must have at least 3 master-eligible nodes and set discovery.zen.minimummasternodes to (mastereligiblenodes / 2) + 1 (e.g. 2 for 3 master-eligible nodes). You must have an odd number of master-eligible nodes.

You can refer to the following links for more information: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#split-brain https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Machine Learning

Machine learning is the hottest subject everyone is interested in.

"How can I detect the anomaly in the logs from multiple servers? Should I run a ML job for each server?"

The idea is that you can create a multi-metric job for each metric. You can select "Machine Learning" at Kibana, create new job, select an index pattern, then choose multi metric, and finally the field with the server names (e.g. "host") at Split Data section. The following is an example of this use case:

enter image description here (Create a Multi-metric job)

Please put a name on the "Name" field at "Job Details" section and click "Create Job". You can run the job and explore the anomalies. enter image description here (Anomaly Explorer)

You can refer to the following links for more information: https://www.elastic.co/guide/en/x-pack/current/ml-gs-multi-jobs.html https://www.elastic.co/kr/videos/machine-learning-tutorial-creating-a-multi-metric-job

Delete Alias

This is a new question regarding 6.0 as well. Last fall I was having a chat with my colleague and he got a phone call asking the following:

"I just deleted indices. Very important indices…"

Until 5.x, if a user deleted an alias (using the DELETE command), Elasticsearch would delete all the underlying indices it is mapped to. To correctly remove aliases, you should use the "remove" command within a POST. In Elasticsearch 6, if you create an alias

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "my_index", "alias" : "alias1" } }
    ]
}
DELETE alias1

And then run DELETE as below.

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "The provided expression [alias1] matches an alias, specify the corresponding concrete indices instead."
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "The provided expression [alias1] matches an alias, specify the corresponding concrete indices instead."
  },
  "status": 400
}

https://www.elastic.co/guide/en/elasticsearch/reference/6.0/indices-delete-index.html says "Aliases cannot be used to delete an index."

In addition, you cannot recover indices that were previously deleted by the delete API. Please make sure to backup periodically using snapshots. You can refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html for more information.

I was very happy to be at Elastic{ON} Tour Seoul. I am looking forward to attend and see you all back at Elastic{ON} 2018 Conference ( https://www.elastic.co/elasticon/conf/2018/sf ) as well to get new and useful information. Thank you!

enter image description here

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Ask Me Anything Insights from Elastic{ON} Tour Seoul

Removal of Mapping Types: Parent/Child relationships

Rejecting mapping update to…

Split Brain (Quorum) Problem

Machine Learning

Delete Alias

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS