Elasticsearch:使用文档中的自定义分数字段影响评分 [英] Elasticsearch: Influence scoring with custom score field in document
问题描述
我有一组通过 NLP 算法从文本中提取的单词,以及每个文档中每个单词的相关分数.
I have a set of words extracted out of text through NLP algos, with associated score for each word in every document.
例如:
document 1: { "vocab": [ {"wtag":"James Bond", "rscore": 2.14 },
{"wtag":"world", "rscore": 0.86 },
....,
{"wtag":"somemore", "rscore": 3.15 }
]
}
document 2: { "vocab": [ {"wtag":"hiii", "rscore": 1.34 },
{"wtag":"world", "rscore": 0.94 },
....,
{"wtag":"somemore", "rscore": 3.23 }
]
}
我希望每个文档中匹配的 wtag
的 rscore
s 影响 ES 赋予它的 _score
,可能相乘或添加到_score
,影响结果文档的最终 _score
(依次为顺序).有什么办法可以做到这一点吗?
I want rscore
s of matched wtag
in each document to affect the _score
given to it by ES, maybe multiplied or added to the _score
, to influence the final _score
(in turn, order) of the resulting documents. Is there any way to achieve this?
推荐答案
另一种方法是使用嵌套文档:
Another way of approaching this would be to use nested documents:
首先设置映射使 vocab
成为一个嵌套文档,这意味着每个 wtag
/rscore
文档将作为一个单独的文档在内部被索引:
First setup the mapping to make vocab
a nested document, meaning that each wtag
/rscore
document would be indexed internally as a separate document:
curl -XPUT "http://localhost:9200/myindex/" -d'
{
"settings": {"number_of_shards": 1},
"mappings": {
"mytype": {
"properties": {
"vocab": {
"type": "nested",
"fields": {
"wtag": {
"type": "string"
},
"rscore": {
"type": "float"
}
}
}
}
}
}
}'
然后将您的文档编入索引:
Then index your docs:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'
{
"vocab": [
{
"wtag": "James Bond",
"rscore": 2.14
},
{
"wtag": "world",
"rscore": 0.86
},
{
"wtag": "somemore",
"rscore": 3.15
}
]
}'
curl -XPUT "http://localhost:9200/myindex/mytype/2" -d'
{
"vocab": [
{
"wtag": "hiii",
"rscore": 1.34
},
{
"wtag": "world",
"rscore": 0.94
},
{
"wtag": "somemore",
"rscore": 3.23
}
]
}'
然后运行一个 nested
查询来匹配所有的嵌套文档,并将每个匹配的嵌套文档的 rscore
值相加:
And run a nested
query to match all the nested documents and add up the values of rscore
for each nested document which matches:
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'
{
"query": {
"nested": {
"path": "vocab",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match": {
"vocab.wtag": "james bond world"
}
},
"script_score": {
"script": "doc["rscore"].value"
}
}
}
}
}
}'
这篇关于Elasticsearch:使用文档中的自定义分数字段影响评分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!