使用ElasticSearch搜索全球位置名称的有效方式是什么? [英] What is an effective way to search world-wide location names with ElasticSearch?
问题描述
我将 GeoNames.org 提供的位置信息分析到关系数据库中。使用这些信息,我正在尝试构建一个包含填充地点(城市)名称,行政区划(州,省等)名称,国家名称和国家/地区代码的ElasticSearch索引。我的目标是提供与Google地图类似的位置搜索:
I have location information provided by GeoNames.org parsed into a relational database. Using this information, I am attempting to build an ElasticSearch index that contains populated place (city) names, administrative division (state, province, etc.) names, country names and country codes. My goal is to provide a location search that is similar to Google Maps':
我不需要很酷的大胆突出显示,但我确实需要搜索以类似的方式返回类似的结果。我尝试创建一个包含整个位置名称的单个位置字段的映射(例如,Round Rock,TX,United States),我也尝试了由每个位置组成的五个单独的字段。我试过关键字和前缀查询和edgengram分析器;我未能找到正确的配置,以使其正常工作。
I don't need the cool bold highlighting, but I do need the search to return similar results in a similar way. I've tried creating a mapping with a single location field consisting of the entire location name (e.g., "Round Rock, TX, United States") and I've also tried having five separate fields consisting of each piece of a location. I've tried keyword and prefix queries and edgengram analyzers; I have been unsuccessful in finding the correct configuration to get this working properly.
我应该寻找什么样的分析器 - 索引和搜索 - 完成我的目标是什么?这个搜索不一定像谷歌那样完美,但我希望它至少是相似的。
What kinds of analyzers--both index and search--should I be looking at to accomplish my goals? This search doesn't have to be as perfected as Google's but I'd like it to be at least similar.
我想要支持部分名称匹配,其中是为什么我一直在玩edgengram。例如,搜索round r应该匹配Round Rock,TX,United States。此外,我更喜欢将人口稠密地点(城市)名称以确切搜索词开头的结果排名高于其他结果。例如,在Round,Some Province,RO(罗马尼亚)之前,搜索round ro应该匹配Round Rock,TX,United States。我希望我已经足够清楚了。
I do want to support partial-name matches, which is why I've been fiddling with edgengram. For example, a search of "round r" should match Round Rock, TX, United States. Also, I would prefer that results whose populated place (city) names begin with the exact search term be ranked higher than other results. For example, a search of "round ro" should match Round Rock, TX, United States before Round, Some Province, RO (Romania). I hope I've made this clear enough.
这是我当前的索引配置(这是C#中的匿名类型,后来序列化为JSON并传递给ElasticSearch API):
Here is my current index configuration (this is an anonymous type in C# that is later serialized to JSON and passed to the ElasticSearch API):
settings = new
{
index = new
{
number_of_shards = 1,
number_of_replicas = 0,
refresh_interval = -1,
analysis = new
{
analyzer = new
{
edgengram_index_analyzer = new
{
type = "custom",
tokenizer = "index_tokenizer",
filter = new[] { "lowercase", "asciifolding" },
char_filter = new[] { "no_commas_char_filter" },
stopwords = new object[0]
},
search_analyzer = new
{
type = "custom",
tokenizer = "standard",
filter = new[] { "lowercase", "asciifolding" },
char_filter = new[] { "no_commas_char_filter" },
stopwords = new object[0]
}
},
tokenizer = new
{
index_tokenizer = new
{
type = "edgeNGram",
min_gram = 1,
max_gram = 100
}
},
char_filter = new
{
no_commas_char_filter = new
{
type = "mapping",
mappings = new[] { ",=>" }
}
}
}
}
},
mappings = new
{
location = new
{
_all = new { enabled = false },
properties = new
{
populatedPlace = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
administrativeDivision = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
administrativeDivisionAbbreviation = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
country = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
countryCode = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
population = new { type = "long" }
}
}
}
推荐答案
可能是你正在寻找的:
"analysis": {
"tokenizer": {
"name_tokenizer": {
"type": "edgeNGram",
"max_gram": 100,
"min_gram": 2,
"side": "front"
}
},
"analyzer": {
"name_analyzer": {
"tokenizer": "whitespace",
"type": "custom",
"filter": ["lowercase", "multi_words", "name_filter"]
},
},
"filter": {
"multi_words": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 10
},
"name_filter": {
"type": "edgeNGram",
"max_gram": 100,
"min_gram": 2,
"side": "front"
},
}
}
我认为使用 name_analyzer
将复制您正在谈论的Google搜索。您可以调整配置以满足您的需要。
I think using name_analyzer
will replicate the google search that you are talking about. You can tweak the configuration a bit to suit your needs.
这篇关于使用ElasticSearch搜索全球位置名称的有效方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!