Elasticsearch - 对嵌套对象列表进行脚本过滤 [英] Elasticsearch - Script Filter over a list of nested objects
问题描述
我正试图弄清楚如何解决我在 ES 5.6 索引中遇到的这两个问题.
I am trying to figure out how to solve these two problems that I have with my ES 5.6 index.
"mappings": {
"my_test": {
"properties": {
"Employee": {
"type": "nested",
"properties": {
"Name": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"Surname": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
我需要创建两个单独的脚本过滤器:
I need to create two separate scripted filters:
1 - 过滤员工数组大小为 == 3 的文档
1 - Filter documents where size of employee array is == 3
2 - 过滤数组的第一个元素具有Name"==John"的文档
2 - Filter documents where the first element of the array has "Name" == "John"
我试图迈出第一步,但我无法遍历列表.我总是有空指针异常错误.
I was trying to make some first steps, but I am unable to iterate over the list. I always have a null pointer exception error.
{
"bool": {
"must": {
"nested": {
"path": "Employee",
"query": {
"bool": {
"filter": [
{
"script": {
"script" : """
int array_length = 0;
for(int i = 0; i < params._source['Employee'].length; i++)
{
array_length +=1;
}
if(array_length == 3)
{
return true
} else
{
return false
}
"""
}
}
]
}
}
}
}
}
}
推荐答案
正如 Val 所注意到的,您无法在 Elasticsearch 的最新版本中访问脚本查询中的 _source
文档.但elasticsearch 允许您在分数上下文"中访问此_source
.
As Val noticed, you cant access _source
of documents in script queries in recent versions of Elasticsearch.
But elasticsearch allow you to access this _source
in the "score context".
因此,一种可能的解决方法(但您需要注意性能)是在您的查询中使用脚本分数与 min_score 相结合.
So a possible workaround ( but you need to be careful about the performance ) is to use a scripted score combined with a min_score in your query.
您可以在此堆栈溢出帖子中找到此行为的示例 通过elasticsearch中嵌套字段值的总和查询文档 .
You can find an example of this behavior in this stack overflow post Query documents by sum of nested field values in elasticsearch .
在您的情况下,这样的查询可以完成这项工作:
In your case a query like this can do the job :
POST <your_index>/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
if (params["_source"]["Employee"].length === params.nbEmployee) {
def firstEmployee = params._source["Employee"].get(0);
if (firstEmployee.Name == params.name) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
""",
"params": {
"nbEmployee": 3,
"name": "John"
}
}
}
}
]
}
}
}
应该在参数中设置员工的数量和名字,以避免针对此脚本的每个用例重新编译脚本.
但请记住,正如 Val 已经提到的那样,它对您的集群来说可能非常繁重.您应该通过在 function_score 查询
(在我的示例中为 match_all )中添加过滤器来缩小您将应用脚本的文档集的范围.在任何情况下,这都不是 Elasticsearch 应该使用的方式,您不能指望这样一个被黑的查询会有出色的表现.
But remember it can be very heavy on your cluster as Val already mentioned. You should narrow the set a document on which your will apply the script by adding filters in the function_score query
( match_all in my example ).
And in any case, it is not the way Elasticsearch should be used and you cant expect bright performances with such a hacked query.
这篇关于Elasticsearch - 对嵌套对象列表进行脚本过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!