使用ElasticSearch搜索全球位置名称的有效方式是什么? [英] What is an effective way to search world-wide location names with ElasticSearch?

查看:153
本文介绍了使用ElasticSearch搜索全球位置名称的有效方式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将 GeoNames.org 提供的位置信息分析到关系数据库中。使用这些信息,我正在尝试构建一个包含填充地点(城市)名称,行政区划(州,省等)名称,国家名称和国家/地区代码的ElasticSearch索引。我的目标是提供与Google地图类似的位置搜索:

I have location information provided by GeoNames.org parsed into a relational database. Using this information, I am attempting to build an ElasticSearch index that contains populated place (city) names, administrative division (state, province, etc.) names, country names and country codes. My goal is to provide a location search that is similar to Google Maps':

我不需要很酷的大胆突出显示,但我确实需要搜索以类似的方式返回类似的结果。我尝试创建一个包含整个位置名称的单个位置字段的映射(例如,Round Rock,TX,United States),我也尝试了由每个位置组成的五个单独的字段。我试过关键字和前缀查询和edgengram分析器;我未能找到正确的配置,以使其正常工作。

I don't need the cool bold highlighting, but I do need the search to return similar results in a similar way. I've tried creating a mapping with a single location field consisting of the entire location name (e.g., "Round Rock, TX, United States") and I've also tried having five separate fields consisting of each piece of a location. I've tried keyword and prefix queries and edgengram analyzers; I have been unsuccessful in finding the correct configuration to get this working properly.

我应该寻找什么样的分析器 - 索引和搜索 - 完成我的目标是什么?这个搜索不一定像谷歌那样完美,但我希望它至少是相似的。

What kinds of analyzers--both index and search--should I be looking at to accomplish my goals? This search doesn't have to be as perfected as Google's but I'd like it to be at least similar.

我想要支持部分名称匹配,其中是为什么我一直在玩edgengram。例如,搜索round r应该匹配Round Rock,TX,United States。此外,我更喜欢将人口稠密地点(城市)名称以确切搜索词开头的结果排名高于其他结果。例如,在Round,Some Province,RO(罗马尼亚)之前,搜索round ro应该匹配Round Rock,TX,United States。我希望我已经足够清楚了。

I do want to support partial-name matches, which is why I've been fiddling with edgengram. For example, a search of "round r" should match Round Rock, TX, United States. Also, I would prefer that results whose populated place (city) names begin with the exact search term be ranked higher than other results. For example, a search of "round ro" should match Round Rock, TX, United States before Round, Some Province, RO (Romania). I hope I've made this clear enough.

这是我当前的索引配置(这是C#中的匿名类型,后来序列化为JSON并传递给ElasticSearch API):

Here is my current index configuration (this is an anonymous type in C# that is later serialized to JSON and passed to the ElasticSearch API):

settings = new
{
    index = new
    {
        number_of_shards = 1,
        number_of_replicas = 0,
        refresh_interval = -1,
        analysis = new
        {
            analyzer = new
            {
                edgengram_index_analyzer = new
                {
                    type = "custom",
                    tokenizer = "index_tokenizer",
                    filter = new[] { "lowercase", "asciifolding" },
                    char_filter = new[] { "no_commas_char_filter" },
                    stopwords = new object[0]
                },
                search_analyzer = new
                {
                    type = "custom",
                    tokenizer = "standard",
                    filter = new[] { "lowercase", "asciifolding" },
                    char_filter = new[] { "no_commas_char_filter" },
                    stopwords = new object[0]
                }
            },
            tokenizer = new
            {
                index_tokenizer = new
                {
                    type = "edgeNGram",
                    min_gram = 1,
                    max_gram = 100
                }
            },
            char_filter = new
            {
                no_commas_char_filter = new
                {
                    type = "mapping",
                    mappings = new[] { ",=>" }
                }
            }
        }
    }
},
mappings = new
{
    location = new
    {
        _all = new { enabled = false },
        properties = new
        {
            populatedPlace = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
            administrativeDivision = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
            administrativeDivisionAbbreviation = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
            country = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
            countryCode = new { index_analyzer = "edgengram_index_analyzer", type = "string" },
            population = new { type = "long" }
        }
    }
}


推荐答案

可能是你正在寻找的:

  "analysis": {
    "tokenizer": {
      "name_tokenizer": {
        "type": "edgeNGram",
        "max_gram": 100,
        "min_gram": 2,
        "side": "front"
      }
    },
    "analyzer": {
      "name_analyzer": {
        "tokenizer": "whitespace",
        "type": "custom",
        "filter": ["lowercase", "multi_words", "name_filter"]
      },
    },
    "filter": {
      "multi_words": {
        "type": "shingle",
        "min_shingle_size": 2,
        "max_shingle_size": 10
      },
      "name_filter": {
        "type": "edgeNGram",
        "max_gram": 100,
        "min_gram": 2,
        "side": "front"
      },          
    }
  }

我认为使用 name_analyzer 将复制您正在谈论的Google搜索。您可以调整配置以满足您的需要。

I think using name_analyzer will replicate the google search that you are talking about. You can tweak the configuration a bit to suit your needs.

这篇关于使用ElasticSearch搜索全球位置名称的有效方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆