弹性搜索 - 单个字段的返回项频率 [英] elasticsearch - Return term frequency of a single field

查看:101
本文介绍了弹性搜索 - 单个字段的返回项频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用facet获得字段的术语频率。我的查询只返回一个命中,所以我想让方面返回特定字段中频率最高的术语。



我的映射:

  {
mappings:{
document:{
properties:{
标签:{
type:object,
properties:{
title:{
fields:{
partial:{
search_analyzer:main,
index_analyzer:partial,
type:string,
index分析
}
title:{
type:string,
analyzer:main,
index

},
type:multi_field
}
}
}
}
}
},

设置:{
分析:{
过滤器:{
name_ngrams:{
side ,
max_gram:50,
min_gram:2,
type:edgeNGram
}
},

analyzer:{
main:{
filter:[standard,lowercase,asciifolding],
type:custom,
tokenizer:standard
},
partial:{
filter:[standard,lowercase,asciifolding,name_ngrams ,
type:custom,
tokenizer:standard
}
}
}
}

}

测试数据:

  curl -XPUT localhost:9200 / testindex / document -d'{tags {title:people also kill people}}'

查询:

$文件/ _search?pretty = 1'-d'
{
查询:
{
term:{tags.title:people}
},
facets:{
popular_tags {terms:{field:tags.title}}
}
}'

此结果

 hits:{
total:1,
max_score:0.99381393,
hits:[{
_index:testindex,
_type:document,
_id uI5k0wggR9KAvG9o7S7L2g,
_score:0.99381393,_source:{tags:{title:people also kill people}}
}]
},
facets:{
popular_tags:{
_type:terms,
missing:0,
total:3,
other:0,
条款:[{
term:people,
count:1 //我希望这是2
},{
term:kill,
count:1
},{
term:also,
count:1
} ]
}

}



以上结果不是我想要的。我想要频率计数为2

 hits:{
total:1,
max_score:0.99381393,
hits:[{
_index:testindex,
_type:document,
_id uI5k0wggR9KAvG9o7S7L2g,
_score:0.99381393,_source:{tags:{title:people also kill people}}
}]
},
facets:{
popular_tags:{
_type:terms,
missing:0,
total:3,
other:0,
条款:[{
term:people,
count:2
},{
term:kill,
count:1
},{
term:also,
count:1
} ]
}
}

如何实现?

解决方案

一个方面是对文件进行计数,而不是属于它们的术语。你得到1,因为只有一个文档包含该术语,发生的次数并不重要。我不知道一个开箱即用的方式来返回术语频率,这个方面不是一个不错的选择。

如果您启用了这个术语向量,那么该信息可以存储在索引中,但是现在没有办法从弹性搜索中读取这个术语。


I have being trying to use facet to get the term frequency of a field. My query returns just one hit, so I would like to have the facet return the terms that have the most frequency in a particular field.

My mapping:

{
"mappings":{
    "document":{
        "properties":{
            "tags":{
                "type":"object",
                "properties":{
                    "title":{
                        "fields":{
                            "partial":{
                                "search_analyzer":"main",
                                "index_analyzer":"partial",
                                "type":"string",
                                "index" : "analyzed"
                            }
                            "title":{
                                "type":"string",
                                "analyzer":"main",
                                "index" : "analyzed"
                            }
                        },
                        "type":"multi_field"
                    }
                }
            }
        }
    }
},

"settings":{
    "analysis":{
        "filter":{
            "name_ngrams":{
                "side":"front",
                "max_gram":50,
                "min_gram":2,
                "type":"edgeNGram"
            }
        },

        "analyzer":{
            "main":{
                "filter": ["standard", "lowercase", "asciifolding"],
                "type": "custom",
                "tokenizer": "standard"
            },
            "partial":{
                "filter":["standard","lowercase","asciifolding","name_ngrams"],
                "type": "custom",
                "tokenizer": "standard"
            }
        }
    }
}

}

Test data:

 curl -XPUT localhost:9200/testindex/document -d '{"tags": {"title": "people also kill people"}}'

Query:

 curl -XGET 'localhost:9200/testindex/document/_search?pretty=1' -d '
{
    "query":
    {
       "term": { "tags.title": "people" }
    },
    "facets": {
       "popular_tags": { "terms": {"field": "tags.title"}}
    }
}'

This result

"hits" : {
   "total" : 1,
    "max_score" : 0.99381393,
    "hits" : [ {
    "_index" : "testindex",
    "_type" : "document",
    "_id" : "uI5k0wggR9KAvG9o7S7L2g",
    "_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}}
 } ]
},
"facets" : {
  "popular_tags" : {
  "_type" : "terms",
  "missing" : 0,
  "total" : 3,
  "other" : 0,
  "terms" : [ {
    "term" : "people",
    "count" : 1            // I expect this to be 2
   }, {
    "term" : "kill",
    "count" : 1
  }, {
    "term" : "also",
    "count" : 1
  } ]
}

}

The above result is not what I want. I want to have the frequency count be 2

"hits" : {
   "total" : 1,
   "max_score" : 0.99381393,
   "hits" : [ {
   "_index" : "testindex",
   "_type" : "document",
   "_id" : "uI5k0wggR9KAvG9o7S7L2g",
   "_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}}
} ]
},
"facets" : {
"popular_tags" : {
  "_type" : "terms",
  "missing" : 0,
  "total" : 3,
  "other" : 0,
  "terms" : [ {
    "term" : "people",
    "count" : 2            
  }, {
    "term" : "kill",
    "count" : 1
  }, {
    "term" : "also",
    "count" : 1
  } ]
 }
}

How do I achieve this? Is facet the wrong way to go?

解决方案

A facet counts the documents, not the terms belonging to them. You get 1 because only one document contains that term, it doesn't matter how many times that happens. I'm not aware of an out of the box way to return the term frequency, the facet is not a good choice.
That information could be stored in the index if you enable the term vectors, but there's no way to read the term vectors from elasticsearch by now.

这篇关于弹性搜索 - 单个字段的返回项频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆