弹性搜索 - 单个字段的返回项频率 [英] elasticsearch - Return term frequency of a single field
问题描述
我正在尝试使用facet获得字段的术语频率。我的查询只返回一个命中,所以我想让方面返回特定字段中频率最高的术语。
我的映射:
{
mappings:{
document:{
properties:{
标签:{
type:object,
properties:{
title:{
fields:{
partial:{
search_analyzer:main,
index_analyzer:partial,
type:string,
index分析
}
title:{
type:string,
analyzer:main,
index
},
type:multi_field
}
}
}
}
}
},
设置:{
分析:{
过滤器:{
name_ngrams:{
side ,
max_gram:50,
min_gram:2,
type:edgeNGram
}
},
analyzer:{
main:{
filter:[standard,lowercase,asciifolding],
type:custom,
tokenizer:standard
},
partial:{
filter:[standard,lowercase,asciifolding,name_ngrams ,
type:custom,
tokenizer:standard
}
}
}
}
}
测试数据:
curl -XPUT localhost:9200 / testindex / document -d'{tags {title:people also kill people}}'
查询:
$文件/ _search?pretty = 1'-d'{
查询:
{
term:{tags.title:people}
},
facets:{
popular_tags {terms:{field:tags.title}}
}
}'
此结果
hits:{
total:1,
max_score:0.99381393,
hits:[{
_index:testindex,
_type:document,
_id uI5k0wggR9KAvG9o7S7L2g,
_score:0.99381393,_source:{tags:{title:people also kill people}}
}]
},
facets:{
popular_tags:{
_type:terms,
missing:0,
total:3,
other:0,
条款:[{
term:people,
count:1 //我希望这是2
},{
term:kill,
count:1
},{
term:also,
count:1
} ]
}
}
以上结果不是我想要的。我想要频率计数为2
hits:{
total:1,
max_score:0.99381393,
hits:[{
_index:testindex,
_type:document,
_id uI5k0wggR9KAvG9o7S7L2g,
_score:0.99381393,_source:{tags:{title:people also kill people}}
}]
},
facets:{
popular_tags:{
_type:terms,
missing:0,
total:3,
other:0,
条款:[{
term:people,
count:2
},{
term:kill,
count:1
},{
term:also,
count:1
} ]
}
}
如何实现?
一个方面是对文件进行计数,而不是属于它们的术语。你得到1,因为只有一个文档包含该术语,发生的次数并不重要。我不知道一个开箱即用的方式来返回术语频率,这个方面不是一个不错的选择。
如果您启用了这个术语向量,那么该信息可以存储在索引中,但是现在没有办法从弹性搜索中读取这个术语。
I have being trying to use facet to get the term frequency of a field. My query returns just one hit, so I would like to have the facet return the terms that have the most frequency in a particular field.
My mapping:
{
"mappings":{
"document":{
"properties":{
"tags":{
"type":"object",
"properties":{
"title":{
"fields":{
"partial":{
"search_analyzer":"main",
"index_analyzer":"partial",
"type":"string",
"index" : "analyzed"
}
"title":{
"type":"string",
"analyzer":"main",
"index" : "analyzed"
}
},
"type":"multi_field"
}
}
}
}
}
},
"settings":{
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":50,
"min_gram":2,
"type":"edgeNGram"
}
},
"analyzer":{
"main":{
"filter": ["standard", "lowercase", "asciifolding"],
"type": "custom",
"tokenizer": "standard"
},
"partial":{
"filter":["standard","lowercase","asciifolding","name_ngrams"],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
Test data:
curl -XPUT localhost:9200/testindex/document -d '{"tags": {"title": "people also kill people"}}'
Query:
curl -XGET 'localhost:9200/testindex/document/_search?pretty=1' -d '
{
"query":
{
"term": { "tags.title": "people" }
},
"facets": {
"popular_tags": { "terms": {"field": "tags.title"}}
}
}'
This result
"hits" : {
"total" : 1,
"max_score" : 0.99381393,
"hits" : [ {
"_index" : "testindex",
"_type" : "document",
"_id" : "uI5k0wggR9KAvG9o7S7L2g",
"_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}}
} ]
},
"facets" : {
"popular_tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "people",
"count" : 1 // I expect this to be 2
}, {
"term" : "kill",
"count" : 1
}, {
"term" : "also",
"count" : 1
} ]
}
}
The above result is not what I want. I want to have the frequency count be 2
"hits" : {
"total" : 1,
"max_score" : 0.99381393,
"hits" : [ {
"_index" : "testindex",
"_type" : "document",
"_id" : "uI5k0wggR9KAvG9o7S7L2g",
"_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}}
} ]
},
"facets" : {
"popular_tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "people",
"count" : 2
}, {
"term" : "kill",
"count" : 1
}, {
"term" : "also",
"count" : 1
} ]
}
}
How do I achieve this? Is facet the wrong way to go?
A facet counts the documents, not the terms belonging to them. You get 1 because only one document contains that term, it doesn't matter how many times that happens. I'm not aware of an out of the box way to return the term frequency, the facet is not a good choice.
That information could be stored in the index if you enable the term vectors, but there's no way to read the term vectors from elasticsearch by now.
这篇关于弹性搜索 - 单个字段的返回项频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!