在script_score中使用嵌套值 [英] Using nested values in script_score
问题描述
我正在尝试在脚本分数中使用嵌套值,但是在使它起作用方面存在问题,因为我无法通过 doc 访问它来遍历该字段.另外,当我尝试像_type:images AND _exists_:colors
这样在Kibana中查询它时,它将不匹配任何文档,即使当我单独查看它们时,所有文档中都清楚地显示了该字段.但是,我可以使用 params._source 来访问它,但是我读到它可能会变慢,因此不建议这样做.
I am attempting to use nested values in a script score, but I am having issues making it work, because I am unable to iterate over the field by accessing it through doc. Also, when I try to query it in Kibana like _type:images AND _exists_:colors
, it will not match any documents, even though the field is clearly present in all my docs when I view them individually. I am however able to access it using params._source, but I have read that it can be slow slow and is not really recommended.
我知道此问题完全是由于我们创建此嵌套字段的方式造成的,因此,如果我无法提出比此更好的东西,则必须重新索引2m +文档,看看是否可以找到其他解决方法问题,但我想避免这种情况,并且还可以更好地了解Elastic在幕后的工作方式以及它在这里的工作方式.
I know that this issue is all due to the way we have created this nested field, so if I cannot come up with something better than this, I will have to reindex our 2m+ documents and see if I can find another way around the problem, but I would like to avoid that, and also just get a better understanding of how Elastic works behind the scenes, and why it acts the way it does here.
我将在此处提供的示例不是我的现实生活中的问题,而是也描述了该问题. 假设我们有一个描述图像的文档.该文档的字段包含图像中红色,蓝色和绿色的数量.
The example I will provide here is not my real life issue, but describes the issue just as well. Imagine we have a document, that describes an image. This document has a field that contains values for how much red, blue, and green exists in an image.
请求创建带有嵌套字段的索引和文档,该嵌套字段包含颜色数组,它们之间分为100点:
Requests to create index and documents with nested field that contains arrays of colors with a 100 point split between them:
PUT images
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"id" : { "type" : "integer" },
"title" : { "type" : "text" },
"description" : { "type" : "text" },
"colors": {
"type": "nested",
"properties": {
"red": {
"type": "double"
},
"green": {
"type": "double"
},
"blue": {
"type": "double"
}
}
}
}
}
}
}
PUT images/_doc/1
{
"id" : 1,
"title" : "Red Image",
"description" : "Description of Red Image",
"colors": [
{
"red": 100
},
{
"green": 0
},
{
"blue": 0
}
]
}
PUT images/_doc/2
{
"id" : 2,
"title" : "Green Image",
"description" : "Description of Green Image",
"colors": [
{
"red": 0
},
{
"green": 100
},
{
"blue": 0
}
]
}
PUT images/_doc/3
{
"id" : 3,
"title" : "Blue Image",
"description" : "Description of Blue Image",
"colors": [
{
"red": 0
},
{
"green": 0
},
{
"blue": 100
}
]
}
现在,如果我使用 doc 运行此查询:
Now, if I run this query, using doc:
GET images/_search
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": {
"source": """
boolean debug = true;
for(color in doc["colors"]) {
if (debug === true) {
throw new Exception(color["red"].toString());
}
}
"""
}
}
}
]
}
}
}
我将获得异常No field found for [colors] in mapping with types []
,但是如果我改用 params._source ,则像这样:
I will get exception No field found for [colors] in mapping with types []
, but if I use params._source instead, like so:
GET images/_search
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": {
"source": """
boolean debug = true;
for(color in params._source["colors"]) {
if (debug === true) {
throw new Exception(color["red"].toString());
}
}
"""
}
}
}
]
}
}
}
我能够输出"caused_by": {"type": "exception", "reason": "100"}
,所以我知道它起作用了,因为第一个文档是红色的,并且值为100.
I am able to output "caused_by": {"type": "exception", "reason": "100"}
, so I know that it worked since the first document is red and has a value of 100.
我什至不确定这是否可以归类为问题,但更需要帮助.如果有人可以解释这种现象的发生原因,并给出解决该问题的最佳方法的想法,我将不胜感激.
I am not even sure that this can classify as a question, but more a cry for help. If someone can explain why this is behaving the way it is, and give an idea of the best way to get around the issue, I would really appreciate it.
(此外,在Painless中进行调试的一些技巧也很可爱!)
推荐答案
不用担心params._source
的速度-这是您唯一的选择,因为迭代doc
的嵌套上下文仅允许单个嵌套要访问的颜色.
Don't worry about the slowness of params._source
-- it's your only choice here because iterating the doc
's nested context only allows a single nested color to be accessed.
尝试一下:
GET images/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "image"
}
},
{
"function_score": {
"functions": [
{
"script_score": {
"script": {
"source": """
def score = 0;
for (color in params._source["colors"]) {
// Debug.explain(color);
if (color.containsKey('red')) {
score += color['red'] ;
}
}
return score;
"""
}
}
}
]
}
}
]
}
}
}
无痛评分上下文位于此处.
第二,您非常接近手动抛出异常-不过,有一种更干净的方法可以执行此操作.取消注释Debug.explain(color);
,您就可以开始了.
Secondly, you were pretty close w/ throwing an exception manually -- there's a cleaner way to do it though. Uncomment Debug.explain(color);
and you're good to go.
还有一件事情,我故意添加了一个match
查询以提高得分,但更重要的是,它说明了后台如何构建查询-在GET images/_validate/query?explain
下重新运行以上内容时,您会看到为自己.
One more thing, I purposefully added a match
query to increase the scores but, more importantly, to illustrate how a query is built in the background -- when you rerun the above under GET images/_validate/query?explain
, you'll see for yourself.
这篇关于在script_score中使用嵌套值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!