字段折叠

字段折叠edit

一个普遍的需求是需要通过特定字段进行分组。例如我们需要按照用户名称分组返回最相关的博客文章。按照用户名分组意味着进行 terms 聚合。为能够按照用户整体名称进行分组，名称字段应保持 not_analyzed 的形式，具体说明参考聚合与分析：

PUT /my_index/_mapping/blogpost
{
  "properties": {
    "user": {
      "properties": {
        "name": { 
          "type": "string",
          "fields": {
            "raw": { 
              "type":  "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

	`user.name` 字段将用来进行全文检索。
	`user.name.raw` 字段将用来通过 `terms` 聚合进行分组。

然后添加一些数据:

PUT /my_index/user/1
{
  "name": "John Smith",
  "email": "john@smith.com",
  "dob": "1970/10/24"
}

PUT /my_index/blogpost/2
{
  "title": "Relationships",
  "body": "It's complicated...",
  "user": {
    "id": 1,
    "name": "John Smith"
  }
}

PUT /my_index/user/3
{
  "name": "Alice John",
  "email": "alice@john.com",
  "dob": "1979/01/04"
}

PUT /my_index/blogpost/4
{
  "title": "Relationships are cool",
  "body": "It's not complicated at all...",
  "user": {
    "id": 3,
    "name": "Alice John"
  }
}

现在我们来查询标题包含 relationships 并且作者名包含 John 的博客，查询结果再按作者名分组，感谢 top_hits aggregation 提供了按照用户进行分组的功能：

GET /my_index/blogpost/_search
{
  "size" : 0, 
  "query": { 
    "bool": {
      "must": [
        { "match": { "title":     "relationships" }},
        { "match": { "user.name": "John"          }}
      ]
    }
  },
  "aggs": {
    "users": {
      "terms": {
        "field":   "user.name.raw",      
        "order": { "top_score": "desc" } 
      },
      "aggs": {
        "top_score": { "max":      { "script":  "_score"           }}, 
        "blogposts": { "top_hits": { "_source": "title", "size": 5 }}  
      }
    }
  }
}

	我们感兴趣的博客文章是通过 `blogposts` 聚合返回的，所以我们可以通过将 `size` 设置成 0 来禁止 `hits` 常规搜索。
	`query` 返回通过 `relationships` 查找名称为 `John` 的用户的博客文章。
	`terms` 聚合为每一个 `user.name.raw` 创建一个桶。
	`top_score` 聚合对通过 `users` 聚合得到的每一个桶按照文档评分对词项进行排序。
	`top_hits` 聚合仅为每个用户返回五个最相关的博客文章的 `title` 字段。

这里显示简短响应结果：

...
"hits": {
  "total":     2,
  "max_score": 0,
  "hits":      [] 
},
"aggregations": {
  "users": {
     "buckets": [
        {
           "key":       "John Smith", 
           "doc_count": 1,
           "blogposts": {
              "hits": { 
                 "total":     1,
                 "max_score": 0.35258877,
                 "hits": [
                    {
                       "_index": "my_index",
                       "_type":  "blogpost",
                       "_id":    "2",
                       "_score": 0.35258877,
                       "_source": {
                          "title": "Relationships"
                       }
                    }
                 ]
              }
           },
           "top_score": { 
              "value": 0.3525887727737427
           }
        },
...

	因为我们设置 `size` 为 0 ，所以 `hits` 数组是空的。
	在顶层查询结果中出现的每一个用户都会有一个对应的桶。
	在每个用户桶下面都会有一个 `blogposts.hits` 数组包含针对这个用户的顶层查询结果。
	用户桶按照每个用户最相关的博客文章进行排序。

使用 top_hits 聚合等效执行一个查询返回这些用户的名字和他们最相关的博客文章，然后为每一个用户执行相同的查询，以获得最好的博客。但前者的效率要好很多。

每一个桶返回的顶层查询命中结果是基于最初主查询进行的一个轻量 迷你查询 结果集。这个迷你查询提供了一些你期望的常用特性，例如高亮显示以及分页功能。

« 非规范化你的数据非规范化和并发 »