elasticsearch - 将嵌套字段与文档中的另一个字段进行比较 [英] elasticsearch - comparing a nested field with another field in the document

查看:40
本文介绍了elasticsearch - 将嵌套字段与文档中的另一个字段进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要比较同一文档中的 2 个字段,其中实际值无关紧要.考虑这个文件:

I need to compare 2 fields in the same document where the actual value does not matter. Consider this document:

_source: {
    id: 123,
    primary_content_type_id: 12,
    content: [
        {
            id: 4,
            content_type_id: 1
            assigned: true
        },
        {
            id: 5,
            content_type_id: 12,
            assigned: false
        }
    ]
}

我需要查找未分配主要内容的所有文档.我找不到将 primary_content_type_id 与嵌套的 content.content_type_id 进行比较以确保它们具有相同值的方法.这是我使用脚本尝试过的.我不认为我理解脚本,但这可能是解决这个问题的一种方法:

I need to find all documents in which the primary content is not assigned. I cannot find a way to compare the primary_content_type_id to the nested content.content_type_id to assure they are the same value. This is what I have tried using a script. I do not think I understand scripts but that may be a way to solve this problem:

{
    "filter": {
        "nested": {
            "path": "content",
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "content.assigned": false
                            }
                        },
                        {
                            "script": {
                                "script": "primary_content_type_id==content.content_type_id"
                            }
                        }
                    ]
                }
            }
        }
    }
}

请注意,如果我删除过滤器的脚本部分并将其替换为 content_type_id = 12 的另一个术语过滤器并添加另一个过滤器 primary_content_id = 12.问题是我不知道(对我的用例也不重要)primary_content_type_idcontent.content_type_id 的值是什么.重要的是,对于 content_type_idprimary_content_type_id 匹配的内容,assigned 为 false.

Note that it works fine if I remove the script portion of the filter and replace it with another term filter where the content_type_id = 12 and also add another filter where the primary_content_id = 12. The problem is that I will not know (nor does it matter for my use case) what the values of primary_content_type_id or content.content_type_id are. It just matters that the assigned is false for the content where the content_type_id matches the primary_content_type_id.

elasticsearch 可以进行这项检查吗?

Is this check possible with elasticsearch?

推荐答案

在嵌套搜索的情况下,您是 搜索没有父级的嵌套对象.不幸的是,没有可用于 nested 对象的隐藏连接.

In the case of the nested search, you are searching the nested objects without the parent. Unfortunately, there is no hidden join that you can apply with nested objects.

至少目前,这意味着您不会同时收到脚本中的父"和嵌套文档.您可以通过用这两个脚本替换您的脚本并测试结果来确认这一点:

At least currently, that means you do not receive both the "parent" and the nested document in the script. You can confirm this by replacing your script with both of these and testing the result:

# Parent Document does not exist
"script": {
  "script": "doc['primary_content_type_id'].value == 12"
}

# Nested Document should exist
"script": {
  "script": "doc['content.content_type_id'].value == 12"
}

可以通过在object之间循环来以一种性能较差的方式做到这一点(而不是让 ES 使用 nested 为你做这件事>).这意味着您必须将您的文档和 nested 文档重新索引为单个文档才能使其工作.考虑到您尝试使用它的方式,这可能不会有太大的不同,它甚至可能表现得更好(尤其是在缺乏替代方案的情况下).

You could do this in a performance-inferior way by looping across objects (rather than inherently having ES do this for you with nested). This means that you would have to reindex your documents and nested documents as a single document for this to work. Considering the way that you are trying to use it, this probably wouldn't be too different and it may even perform better (especially given the lack of an alternative).

# This assumes that your default scripting language is Groovy (default in 1.4)
# Note1: "find" will loop across all of the values, but it will
#  appropriately short circuit if it finds any!
# Note2: It would be preferable to use doc throughout, but since we need the
#  arrays (plural!) to be in the _same_ order, then we need to parse the
#  _source. This inherently means that you must _store_ the _source, which
#  is the default. Parsing the _source only happens on the first touch.
"script": {
  "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null",
  "_cache" : true
}

我缓存了结果,因为这里没有发生任何动态(例如,没有将日期与 now 进行比较),所以缓存是非常安全的,从而使未来的查找很多 更快.大多数过滤器默认被缓存,但 脚本是少数例外之一.

I cached the result because nothing dynamic is occurring here (e.g., not comparing dates to now for instance), so it's pretty safe to cache, thereby making future lookups much faster. Most filters are cached by default, but scripts are one of the few exceptions.

因为它必须比较两个值以确保它找到正确的内部对象,所以您正在重复一些的工作量,但这实际上是不可避免的.使用 term 过滤器很可能比没有它的只做这个检查要好.

Since it must compare both values to be sure that it found the correct inner object, you are duplicating some amount of work, but it's practically unavoidable. Having the term filter is most likely going to be superior to just doing this check without it.

这篇关于elasticsearch - 将嵌套字段与文档中的另一个字段进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆