Elasticsearch-脚本过滤嵌套对象列表 [英] Elasticsearch - Script Filter over a list of nested objects
问题描述
我试图弄清楚如何解决ES 5.6索引中的这两个问题。
I am trying to figure out how to solve these two problems that I have with my ES 5.6 index.
"mappings": {
"my_test": {
"properties": {
"Employee": {
"type": "nested",
"properties": {
"Name": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"Surname": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
我需要创建两个单独的脚本过滤器:
I need to create two separate scripted filters:
1-过滤员工数组大小为== 3的文档
1 - Filter documents where size of employee array is == 3
2-过滤员工数组的第一个元素为 Name ==的文档约翰
2 - Filter documents where the first element of the array has "Name" == "John"
I试图迈出一些第一步,但我无法遍历列表。我总是有一个空指针异常错误。
I was trying to make some first steps, but I am unable to iterate over the list. I always have a null pointer exception error.
{
"bool": {
"must": {
"nested": {
"path": "Employee",
"query": {
"bool": {
"filter": [
{
"script": {
"script" : """
int array_length = 0;
for(int i = 0; i < params._source['Employee'].length; i++)
{
array_length +=1;
}
if(array_length == 3)
{
return true
} else
{
return false
}
"""
}
}
]
}
}
}
}
}
}
推荐答案
如Val所注意到的,您在Elasticsearch的最新版本中,无法访问脚本查询中的文档 _source
。
但是elasticsearch允许您在得分上下文中访问此 _source
。
As Val noticed, you cant access _source
of documents in script queries in recent versions of Elasticsearch.
But elasticsearch allow you to access this _source
in the "score context".
因此,一种可能的解决方法(但您需要注意性能)是在查询中结合使用脚本分数和min_score。
So a possible workaround ( but you need to be careful about the performance ) is to use a scripted score combined with a min_score in your query.
您可以在此堆栈溢出帖子通过Elasticsearch中嵌套字段值的总和查询文档。
You can find an example of this behavior in this stack overflow post Query documents by sum of nested field values in elasticsearch .
在您的情况下,像这样的查询可以完成这项工作:
In your case a query like this can do the job :
POST <your_index>/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
if (params["_source"]["Employee"].length === params.nbEmployee) {
def firstEmployee = params._source["Employee"].get(0);
if (firstEmployee.Name == params.name) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
""",
"params": {
"nbEmployee": 3,
"name": "John"
}
}
}
}
]
}
}
}
应在参数中设置Employee的名称和名字,以避免针对此脚本的每个用例重新编写脚本。
但是请记住,它在您的集群上可能非常繁重,如Val已经提到的那样,您应该在 function_score中添加过滤器来缩小文档集,在该文档上应用脚本查询
(在我的示例中为match_all)。
在任何情况下,都不是应使用Elasticsearch的方式,并且您无法期望这种被黑的查询具有出色的性能。
But remember it can be very heavy on your cluster as Val already mentioned. You should narrow the set a document on which your will apply the script by adding filters in the function_score query
( match_all in my example ).
And in any case, it is not the way Elasticsearch should be used and you cant expect bright performances with such a hacked query.
这篇关于Elasticsearch-脚本过滤嵌套对象列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!