在ElasticSearch中返回部分嵌套的文档 [英] Returning a partial nested document in ElasticSearch

查看:79
本文介绍了在ElasticSearch中返回部分嵌套的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想搜索一组嵌套文档,并仅返回那些符合特定条件的文档.

I'd like to search an array of nested documents and return only those that fit a specific criteria.

示例映射为:

{"book":
    {"properties":
        {
         "title":{"type":"string"},
         "chapters":{
                     "type":"nested",
                     "properties":{"title":{"type":"string"},
                                   "length":{"type":"long"}}
                                  }
                     }
          }
     }
}

因此,假设我要查找标题为结尾"的章节.并非所有的书都有这样的章节,但是如果我使用嵌套查询,那么我会得到这样的章节的书中的所有章节.虽然我只感兴趣的是带有这样标题的章节.

So, say I want to look for chapters titled "epilogue". Not all the books have such a chapter, but If I use a nested query I'd get, as a result, all the chapters in a book that has such a chapter. While all I'm interested is the chapters themselves that have such a title.

我主要关注I/O和网络流量,因为可能会有很多章节.

I'm mainly concerned about i/o and net traffic since there might be a lot of chapters.

还有,是否有一种方法可以只检索嵌套文档而没有包含文档?

Also, is there a way of retrieving ONLY the nested document, without the containing doc?

推荐答案

这是我偶然发现的一个非常老的问题,因此,我将展示两种不同的处理方式.

This is a very old question I stumbled upon, so I'll show two different approaches to how this can be handled.

让我们先准备索引和一些测试数据:

Let's prepare index and some test data first:

PUT /bookindex
{
  "mappings": {
    "book": {
      "properties": {
        "title": {
          "type": "string"
        },
        "chapters": {
          "type": "nested",
          "properties": {
            "title": {
              "type": "string"
            },
            "length": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

PUT /bookindex/book/1
{
  "title": "My first book ever",
  "chapters": [
    {
      "title": "epilogue",
      "length": 1230
    },
    {
      "title": "intro",
      "length": 200
    }
  ]
}

PUT /bookindex/book/2
{
  "title": "Book of life",
  "chapters": [
    {
      "title": "epilogue",
      "length": 17
    },
    {
      "title": "toc",
      "length": 42
    }
  ]
}

现在我们在Elasticsearch中拥有此数据,我们可以使用

Now that we have this data in Elasticsearch, we can retrieve just the relevant hits using an inner_hits. This approach is very straightforward, but I prefer the approach outlined at the end.

# Inner hits query
POST /bookindex/book/_search
{
  "_source": false,
  "query": {
    "nested": {
      "path": "chapters",
      "query": {
        "match": {
          "chapters.title": "epilogue"
        }
      },
      "inner_hits": {}
    }
  }
}

inner_hits 嵌套查询返回文档,其中每个匹配都包含一个 inner_hits 对象以及所有匹配的文档,包括评分信息.您可以看到响应.

The inner_hits nested query returns documents, where each hit contains an inner_hits object with all of the matching documents, including scoring information. You can see the response.

我对此类查询的首选方法是使用嵌套聚合

My preferred approach to this type of query is using a nested aggregation with filtered sub aggregation which contains top_hits sub aggregation. The query looks like:

# Nested and filter aggregation
POST /bookindex/book/_search
{
  "size": 0,
  "aggs": {
    "nested": {
      "nested": {
        "path": "chapters"
      },
      "aggs": {
        "filter": {
          "filter": {
            "match": { "chapters.title": "epilogue" }
          },
          "aggs": {
            "t": {
              "top_hits": {
                "size": 100
              }
            }
          }
        }
      }
    }
  }
}

top_hits 子聚合是进行实际检索的子聚合嵌套文档,并在其中支持 from size 属性其他.从文档中:

The top_hits sub aggregation is the one doing the actual retrieving of nested documents and supports from and size properties among others. From the documentation:

如果 top_hits 聚合器包装在嵌套 reverse_nested 中聚合器,然后返回嵌套的匹配.嵌套匹配位于感知隐藏的迷你文档,这些文档是常规文档的一部分映射嵌套字段类型已配置. top_hits 如果聚合器被包装,则聚合器具有取消隐藏这些文档的能力使用 nested reverse_nested 聚合器中.阅读有关嵌套的更多信息嵌套的类型映射.

If the top_hits aggregator is wrapped in a nested or reverse_nested aggregator then nested hits are being returned. Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type has been configured. The top_hits aggregator has the ability to un-hide these documents if it is wrapped in a nested or reverse_nested aggregator. Read more about nested in the nested type mapping.

Elasticsearch的响应是(IMO)更漂亮的(而且似乎返回得更快(尽管这不是科学观察),而且解析起来更容易".

The response from Elasticsearch is (IMO) prettier (and it seems to return it faster (though this is not a scientific observation)) and "easier" to parse.

这篇关于在ElasticSearch中返回部分嵌套的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆