弹性搜索通配符查询以获取排序结果 [英] Elastic search wildcard query to get sorted results

查看:127
本文介绍了弹性搜索通配符查询以获取排序结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Elastic Search服务器设置,用于存储要用于公司搜索的公司名称,其工作方式是:

I have a Elastic Search server setup where am storing company names to be used for for company search, the way it works is:

在公司名称中,空格和点将被删除并存储在ES的trimmedcompanyname字段中,

From company name, spaces and dots will be removed and stored in ES in a field called trimmedcompanyname,

{
          "companyName" : "RECKON INFOSYSTEM PRIVATE LIMITED",
          "trimmedCompanyName" : "reckoninfosystemprivatelimited",
          "id" : "1079"
}        

现在,当搜索进入我的服务器时,我会删除空格和点,然后向ES服务器发出请求.查询格式的ES请求为:

now when search comes to my server i remove the spaces and dots and then make request to ES server. The ES request in query format is:

GET /_search
{
   "from": 0,"size": 100,
    "query": {
        "wildcard": {
            "trimmedCompanyName.keyword": {
                "value": "*infosys*"
            }
        }
    }
}

但是我大约有600家名称为infosys的公司,它们将以删除的空格存储.因此,ES向我退回了100家公司,但在这100家公司中,infosys位于第二个单词的开头或第三个单词的开头,但是我希望结果中包括第一个单词然后第二个单词具有infosys的公司,依此类推.

But i have around 600 companies with name infosys in them and they would be stored with spaces removed. So ES returns me 100 companies but in these 100 companies infosys is present in the starting of second word or starting of third word but i want the result to include companies that have infosys in first word and then in second word and so on.

我可以想到的一个解决方案是使用通配符查询infosys*触发两个ES请求,第二个查询*infosys*合并两个结果,删除重复项并返回响应,但是由于此请求必须与分页,因此触发两个请求可能会出错,有人可以帮我吗

One solution i could think up was to fire two ES request one with wildcard query infosys* and second query *infosys* combine both the results, remove the duplicates and give the response back but since this request has to work along with pagination hence firing two request can get things wrong, can someone please help me with this

推荐答案

首先,在语料数据方面,我们在ES中使用的传统相似性算法或查询在计算时不会考虑术语的位置相关性.

First of all, when it comes to corpus data, traditional similarity algorithms or queries that we use in ES would not take into account the position of the terms while calculating the relevancy.

对于基于位置的查询,您需要使用

For positional based queries, you would need to make use of Span Queries

我已经能够提出以下解决方案,该解决方案适用于您的情况.请注意,我已经在字段companyName中使用查询,并且我假设它正在使用

I've been able to come up with the below solution which should work in your case. Note that I've used the query for the field companyName and I assume that it is making use of Standard Analyzer.

下面是映射,样本文档,查询和响应,如下所示:

Below are the mapping, sample documents, the query and response as how it appears:

PUT my_company
{
  "mappings": {
    "properties": {
      "companyName":{
        "type":"text"
      }
    }
  }
}

样本文档:

POST my_company/_doc/1
{
  "companyName": "reckon infosystem private limited"
}

POST my_company/_doc/2
{
  "companyName": "infosys"
}

POST my_company/_doc/3
{
  "companyName": "telecom services infosystem private limited"
}

POST my_company/_doc/4
{
  "companyName":"infosystems technological solution"
}

查询:

POST <your_index_name>/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "span_multi": {
            "match": {
              "wildcard": {
                "companyName": "infosys*"
              }
            }
          }
        }
      ]
    }
  }
}

请注意,我已经在您可能想知道为什么我没有使用字段trimmedCompanyName,这是因为,查看其映射(即使其text类型为standard analyzer),其值或内容是所有这些都视为一个术语,并以这种方式存储在倒排索引中.

You might be wondering why I've not made use of the field trimmedCompanyName, that is because, looking at its mapping, (even if its text type with standard analyzer) the values or contents in it are all considered as a single term and stored that way in inverted index.

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 4.3264027,
    "hits" : [
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 4.3264027,
        "_source" : {
          "companyName" : "infosys"
        }
      },
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 3.2018504,
        "_source" : {
          "companyName" : "infosystems technological solution"
        }
      },
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.8335867,
        "_source" : {
          "companyName" : "reckon infosystem private limited"
        }
      },
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.5412967,
        "_source" : {
          "companyName" : "telecom services infosystem private limited"
        }
      }
    ]
  }
}

让我知道这是否有帮助!

Let me know if this helps!

这篇关于弹性搜索通配符查询以获取排序结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆