如何在Elasticsearch脚本中访问嵌套数组的文档值? [英] How to access doc values of a nested array in Elasticsearch script?

查看:212
本文介绍了如何在Elasticsearch脚本中访问嵌套数组的文档值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定以下索引,我将如何在嵌套数组中选择适当的项目并访问其中的一个值?这里的目的是将它用在 script_score 中的值内。

 #创建映射
curl -XPUT localhost:9200 / test / user / _mapping -d'
{
user:{
properties:{
name:{
type:string
},
skills:{
type:nested,
properties :{
skill_id:{
type:integer
},
recommendations_count:{
type:integer
}
}
}
}
}
}
'

#索引数据
curl - XPUT localhost:9200 / test / user / 1 -d'
{
name:John,
skills:[
{
skill_id :100,
recommendations_count:5
},
{
skill_id:200,
recommendations_count:3
}
]
}
'

卷曲 - XPUT localhost:9200 / test / user / 2 -d'
{
name:Mary,
skills:[
{
skill_id :100,
recommendations_count:9
},
{
skill_id:200,
recommendations_count:0
}
]
}
'

我的查询通过skill_id筛选,好。然后,我想要使用 script_score 来提高用户文档的分数,$ suggest_count 对于给定的skill_id。(< - 这是关键)。

  curl -XPOST localhost:9200 / test / user / _search -d'
{
query:{
function_score:{
query :{
bool:{
must:{
nested:{
path:skills,
query {
bool:{
must:{
term:{
skill_id:100
}
}
}
}
}
}
}
},
函数:[
{
script_score {
script:sqrt(1.2 * doc ['skills.recomme ndations_count']。value)
}
}
]
}
}
}
}
'

如何从 code> script ,在数组中找到'skill_id:100'项,然后使用它的 recommendations_count 值?上面的 script_score 目前不工作(得分总是0,不管数据如何,所以我假设 doc ['skills.recommendations_count']。 不在正确的地方。

解决方案

对于您的具体问题,脚本需要嵌套



这可以重写为ES 1.x:

  curl -XGET'localhost:9200 / test / _search'-d'
{
query :{
nested:{
path:skills,
query:{
filtered:{
filter
term:{
skills.skill_id:100
}
},
查询:{
function_score:{
函数:[
{
script_score:{
script:sqrt(1.2 * doc ['skills.recommendati ons_count']。value)
}
}
]
}
}
}
}
}
}
}'

对于ES 2.x(过滤器成为一流公民ES 2.x,所以语法变了一点赶上!):

  curl -XGET'localhost:9200 / test / _search'-d'
{
query:{
nested:{
path:skills,
query
bool:{
filter:{
term:{
skills.skill_id:100
}
},
must:{
function_score:{
functions:[
{
script_score:{
script:sqrt (1.2 * doc ['skills.recommendations_count']。value)
}
}
]
}
}
}
}
}
}
}'

注意:我将术语查询术语过滤器,因为它对分数(完全匹配或不完全匹配)。我还将嵌套字段的名称添加到术语过滤器中,这是Elasticsearch 2.x和更高版本(和较早的练习)中的要求。



有了这个方法,你可以(应该)尽可能避免使用脚本。这是其中之一。 function_score 支持一个 field_value_factor 函数的概念,让您可以像您正在尝试的那样完成任务,但完全没有脚本的。您还可以选择提供缺少值来控制如果该字段缺失会发生什么。



这转换为完全相同的脚本,但是它会表现得更好:

  curl -XGET'localhost:9200 / test / _search'-d'
{
查询:{
嵌套:{
路径:技能,
查询:{
已过滤:{
filter:{
term:{
skills.skill_id:100
}
},
查询:{
function_score:{
functions:[
{
field_value_factor:{
field:skills.recommendations_count,
factor :1.2,
修饰符:sqrt,
missing:0
}
}
]
}
}
}
}
}
}
}'

对于ES 2.x:

  curl -XGET'localhost:9200 / test / _search'-d'
{
query:{
nested:{
path:skills,
query:{
bool:{
filter:{
term:{
skills.skill_id:100
}
},
必须:{
function_score:{
functions:[
{
field_value_factor:{
field:skills.recommendations_count,
factor :1.2,
修饰符:sqrt,
missing:0
}
}
]
}
}
}
}
}
}
}'

脚本缓慢而且他们也意味着在Elasticsearch 1.x中使用fielddata,这是坏的。你提到了doc值,这是一个有希望的开始,表明使用Elasticsearch 2.x,但这可能只是术语。



如果你刚刚开始弹性搜索,那么我强烈建议从最新版本开始。


Given the following index, how would I select proper item in the nested array and access one of it's values? The purpose here is to use it inside the value inside a script_score.

# Create mapping
curl -XPUT localhost:9200/test/user/_mapping -d '
{
  "user" : {
    "properties" : {
      "name" : {
        "type" : "string"
      },
      "skills" : {
        "type": "nested", 
        "properties" : {
          "skill_id" : {
            "type" : "integer"
          },
          "recommendations_count" : {
            "type" : "integer"
          }
        }
      }
    }
  }
}
'

# Indexing Data
curl -XPUT localhost:9200/test/user/1 -d '
{
   "name": "John",
   "skills": [
      {
         "skill_id": 100,
         "recommendations_count": 5
      },
      {
         "skill_id": 200,
         "recommendations_count": 3
      }
   ]
}
'

curl -XPUT localhost:9200/test/user/2 -d '
{
   "name": "Mary",
   "skills": [
      {
         "skill_id": 100,
         "recommendations_count": 9
      },
      {
         "skill_id": 200,
         "recommendations_count": 0
      }
   ]
}
'

My query filters by skill_id and this works well. I then want to be able to use script_score to boost the score of the user documents with a higher recommendations_count for the given skill_id. (<-- this is key).

curl -XPOST localhost:9200/test/user/_search -d '
{      
    "query":{
      "function_score":{
        "query":{
          "bool":{
            "must":{
              "nested":{
                "path":"skills",
                "query":{
                  "bool":{
                    "must":{
                      "term":{
                        "skill_id":100
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "functions":[
          {
            "script_score": {
               "script": "sqrt(1.2 * doc['skills.recommendations_count'].value)"   
            }
          }            
        ]
      }
    }
  }
} 
'

How do I access the skills array from within the script, find the 'skill_id: 100' item in the array, and then use its recommendations_count value? The script_score above doesn't currently work (score is always 0 regardless of the data, so I assume doc['skills.recommendations_count'].value is not looking in the right place.

解决方案

For your specific question, the script needs the nested context, just like you did with the term query.

This can be rewritten for ES 1.x:

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "filtered": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "query": {
            "function_score": {
              "functions": [
                {
                  "script_score": {
                    "script": "sqrt(1.2 * doc['skills.recommendations_count'].value)"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

For ES 2.x (filters became first-class citizens in ES 2.x, so the syntax changed a bit to catch up!):

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "bool": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "must": {
            "function_score": {
              "functions": [
                {
                  "script_score": {
                    "script": "sqrt(1.2 * doc['skills.recommendations_count'].value)"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

Note: I made the term query a term filter because it has no logical impact on the score (it's either an exact match or not). I also added the nested field's name to the term filter, which is a requirement in Elasticsearch 2.x and later (and good practice earlier).

With that out of the way, you can (and should) avoid using a script whenever possible. This is one of those cases. function_score supports the concept of a field_value_factor function that lets you do things exactly like you are trying, but entirely without a script. You can also optionally supply a "missing" value to control what happens if the field is missing.

This translates to exactly the same script, but it will perform better:

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "filtered": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "query": {
            "function_score": {
              "functions": [
                {
                  "field_value_factor": {
                    "field": "skills.recommendations_count",
                    "factor": 1.2,
                    "modifier": "sqrt",
                    "missing": 0
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

For ES 2.x:

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "bool": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "must": {
            "function_score": {
              "functions": [
                {
                  "field_value_factor": {
                    "field": "skills.recommendations_count",
                    "factor": 1.2,
                    "modifier": "sqrt",
                    "missing": 0
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

Scripts are slow and they also imply the use of fielddata in Elasticsearch 1.x, which is bad. You did mention doc values, which is a promising start that suggests that using Elasticsearch 2.x, but that may have just been terminology.

If you're just starting with Elasticsearch, then I strongly recommend starting with the latest version.

这篇关于如何在Elasticsearch脚本中访问嵌套数组的文档值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆