查询基于Elasticsearch地址的索引 [英] Querying Elasticsearch Address Based Index

查看:143
本文介绍了查询基于Elasticsearch地址的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的很难尝试获取基于地址的索引以返回结果,就像自动完成工作一样,我一直尝试两种不同的方法,我开始尝试使用nGram和自定义分析器但是我我真的很难得到相关的结果,以显示人们在使用地址自动完成时的期望。

I'm Having a really hard time trying to get an address based index to return results in the same was as an autocomplete works, I have been trying two different methods, I started out trying to use nGram's and custom analyzers but i have really struggled to get relevant results to show how one would expect when using an address autocomplete.

我关注的第二种方法是查看完成建议器elasticsearch是否发货可能会更容易上班,但我似乎正朝着各个方向发展。

The second method i have focused on is to see if the completion suggester elasticsearch ships with would be any easier to get working but i seem to be hitting a road block in every direction.

我们根据每个键盘上的输入值发送常规的客户端API调用。

We send regular client-side API calls based on the input value on every key-up.

我似乎面临的问题要么是......我没有返回足够相关的结果,如果/当它们相关时,一个额外的字符部分单词可以强制不返回任何结果。

the issue i seem to face is either.. I'm not returning relevant enough results and if / when they are relevant an additional character partial word can force no results to be returned at all.

以下地址为例: 7 West Hill Gardens,West Hill EX9 6BL

我的文档存储如下:

"id": "1",
"address": "7, Westhill Gardens, Bromyard HR74HW",
"suggest": "7, Westhill Gardens, Bromyard HR74HW"






完成建议映射:




Completions Suggester Mappings:

{
  "mappings": {
    "addresses": {
      "properties": {
        "suggest": {
          "type": "completion",
          "preserve_separators": false,
          "analyzer": "standard",
          "search_analyzer": "standard"
        },
        "address": {
          "type": "text"
        },
        "id": {
          "type": "keyword"
        }
      }
    }
  }
}






注意我将 preserve_separators 设置为 false 在建议中允许西山也匹配为westhill,这对建议者工作正常但是在我的nGram索引我不确定我如何启用与映射相同的功能和我相信这可能是我没有返回相关结果的问题的一部分。


Note i set the preserve_separators to false in the suggester to allow for west hill to also be matched as westhill, This works fine on the suggester however in my nGram index im unsure how i enable to same functionality with mappings and i believe that may be part of the issue i have with not returning relevant results.

建议者在我查询 7 westhill garden时

With the suggester is when i query for 7 westhill gardens using the following query:

{
  "suggest": {
    "suggestions": {
     "prefix": "7 westhill gardens",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2 // Also tried with no fuzzy and fuzziness: 1
        }
      }
    }
  }
}

返回以下结果:

"address": "7, Westhill Gardens, Brackley NN136AA",
"address": "7, Westhill Gardens, Bromyard HR74HW",
"address": "7, West Hill Gardens, West Hill, Budleigh Salterton EX96BL",

但是,如果我从查询中删除数字7并执行此查询,则返回无结果,这是一种关键问题,因为并非所有用户都会使用给定的门牌号码开始查询,并且以西山花园执行搜索是非常常见的 7个西山花园

However if i remove the number 7 from the query and perform this query it returns no results, This is kind of a key issue as not all users will start their query with the given house number and it is quite common to perform the search as west hill gardens as appose to 7 west hill gardens

{
  "suggest": {
    "suggestions": {
      "prefix": "westhill gardens",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

最后,如果我只查询门牌号,如下所示,没有结果是返回。

And lastly if i query for just the house number as shown below, No results are returned.

{
  "suggest": {
    "suggestions": {
      "prefix": "7 EX9 6BL",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}



<我希望有比我更多经验的人可以对最佳方法有什么想法,如果我应该坚持使用nGrams并尝试使用自定义分析仪/过滤器方法......或者我只是完全做到了错误?!我刚刚开始学习弹性搜索,所以如果我的术语不正确,我会道歉。

I'm hoping someone with more experience than me can shed some thoughts on what the best approach would be and if i should stick to nGrams and try and get a custom analyzer / filter approach working.. Or am i just doing it totally wrong?! I have only just started to learn elasticsearch so i send my apologies if my terminology is incorrect.

推荐答案

再考虑完成建议一个以......开头的机制。文档说:完成建议器是一个所谓的前缀建议器。因此,使用这种类型的搜索,你可能无法拥有你想要的一切。

Think about Completion Suggester more as a "starts with ..." mechanism. Documentation says: "The completion suggester is a so-called prefix suggester." So with this type of search you'll propably cannot have everything you want.

为了使它更接近,一个解决方案是 preserve_position_increments 和停用词分析器的组合。首先使用以下设置创建索引:

To get it a bit closer, one solutuion is a combination of preserve_position_increments and stopwords analyzer. First create index with following settings:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_stop_analyzer": {
          "type": "stop"
        }
      }
    }
  }
}

然后映射为documetn类型:

and then mapping for documetn type:

{
  "properties": {
    "suggest": {
      "type": "completion",
      "preserve_separators": false,
      "preserve_position_increments": false
    },
    "address": {
      "type": "text"
    },
    "id": {
      "type": "keyword"
    }
  }
}

然后此查询:

{
  "suggest": {
    "suggestions": {
     "prefix": "westhill gardens",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

会导致两者:

"address": "5, West hill Gardens, Bromyard AAA"
"address": "7, Westhill Gardens, Bromyard HR74HW"

但是如果你试图搜索:prefix:7 garden - 它不会给你结果(因为这种机制的所谓前缀建议性质)。

But if you try to search for: "prefix": "7 gardens" - it wont't give you results (because of so-called prefix suggester nature of this mechanism).

还有什么可能是另一种选择? nGrams,如前所述,或者你也可以试验 query_string 。简单的例子,假设你有一个标准的映射:

What could be another option? nGrams, as already said, or you could also experiment with query_string. Simple example, let's say you have a standard mapping:

{
  "properties": {
    "suggest": {
      "type": "text"
    },
    "address": {
      "type": "text"
    },
    "id": {
      "type": "keyword"
    }
  }
}

然后使用 query_string

{
  "query": {
        "query_string" : {
            "default_field" : "suggest",
            "query" : "west* Gardens*",
            "default_operator": "OR",
            "split_on_whitespace": "true",
            "fuzziness" : 2
    }
  }
}

它给我的结果例如:

"address": "267, Westhill Gardens, Bromyard HR74HW",
"address": "5, West hill Gardens, Bromyard AAA",
"address": "1, West hill Bromyard HR74HW"

但请注意使用*通配符导致更差的性能和内存消耗(确保避免在术语开头使用*)但另一方面 query_string 是一个非常通用的工具。

But please note that using * wildcard results in worse performance and memory consumption (for sure avoid using * at the beginning of a term) but on the other hand query_string is a very versatile tool.

正如我之前写过关于NGrams的文章,我会发布这是第一个想法。

As I have written about NGrams before, I'll post here the first idea for it.

一些初步假设:


  • 启用输入3个字符后自动完成(设置:min_gram:3)

  • 我们需要分析数字,空格,昏迷等 - 如果用户键入7,W,我们需要获取集合结果

  • 用于测试启用ngram向量 - 它允许查看它是如何工作的(设置term_vector:是),但应该在生产时禁用

映射 - 索引和类型 - 如下所示:

Mapping - for index and type - looks like this:

{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "tokenizer": {
            "ngram_tokenizer": {
               "type": "nGram",
               "min_gram": 3,
               "max_gram": 10
            }
         },
         "analyzer": {
            "ngram_tokenizer_analyzer": {
               "type": "custom",
               "tokenizer": "ngram_tokenizer"
            }
         }
      }
   },
   "mappings": {
      "addresses": {
         "properties": {
            "suggest": {
               "type": "text",
               "term_vector": "yes",
               "analyzer": "ngram_tokenizer_analyzer"
            },
            "address": {
              "type": "text"
            },
            "id": {
              "type": "keyword"
            }
         }
      }
   }
}

现在可以索引文档了。您可以通过以下方式检查分析仪的工作原理(感谢term_vector:是):

Now a document can be indexed. You can check how analyzer works (thanks to "term_vector": "yes") with:

GET http://127.0.0.1:9200/sug/addresses/{documentId}/_termvector?fields=suggest

之后查询(这次是Bool查询)非常简单:

And after that the query (Bool Query this time) is really simple:

{ 
  "query" : 
  { "bool" : 
    { "must" : [ 
        { "match" : { "suggest": { "query": "1, Westhil" } } }
    ]}
}

}

我认为应该满足您描述的所有要求 - 搜索地址的起始部分,房屋号码或任何其他部分以及空格问题。如果确实需要,您可以将 min_gram 减少到 2 。如果您需要了解更多详细信息,请随时提出或按照您的建议打开一个新问题。

I think it should meet all the requirements you described - searching with starting part of the address, with house number or any other part and also the issue with spaces. You can decrease min_gram to 2 if this is really necessary. If you need to get into more details feel free to ask or, as you suggested, open a new question.

这篇关于查询基于Elasticsearch地址的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆