弹性搜索:查询字符串和数字并不总是返回所需结果 [英] Elastic Search: Query string and number not always returning wanted result

查看:63
本文介绍了弹性搜索:查询字符串和数字并不总是返回所需结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个弹性搜索5.5设置.我们使用Nest通过C#执行查询.

We have an elastic search 5.5 setup. We use nest to perform our queries through C#.

执行以下查询时:

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "00917751"
          }
        }
      ]
    }
  }
}

我们得到了理想的结果:一个以数字作为标识符的结果.

We get the desired result: one result with that the number as identifier.

执行查询时:

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "917751"
          }
        }
      ]
    }
  }
}

我们没有结果.

我们要搜索的值在字段searchIndentifier中,并且值为"1-00917751".

The value we are searching for is in the field searchIndentifier, and has the value "1-00917751".

我们有一个名为"final"的自定义分析器

We have a custom analyzer called "final"

.Custom("final",cu => cu .Tokenizer("keyword").Filters(new List(){"lowercase"}))

.Custom("final", cu => cu .Tokenizer("keyword").Filters(new List() { "lowercase" }))

字段searchIndentifier上没有设置自定义分析器.我尝试在其中添加空白令牌生成器,但这没什么区别.

The field searchIndentifier has no custom analyzer set on it. I tried adding the whitespace tokenizer in it but that made no difference.

当我尝试使用查询"S328"搜索值"S328-25"时,另一个名为"searchObjectNo"的字段确实起作用.这些字段完全相同.

Another field called "searchObjectNo" does work, when I try to search for the value "S328-25" with the query "S328". These fields are exactly the same.

这里有什么想法吗?

另一个相关问题: 执行查询时

Another related question: When executing the query

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "1-00917751"
          }
        }
      ]
    }
  }
}

我们得到了很多结果.我只想返回1个结果.我们将如何做到这一点?

we get a lot of results. I would like this to return only 1 result. How would we accomplish this?

谢谢 cho

设置和映射: https://jsonblob.com/9dbf33f6-cd3e-11e8 -8f17-c9de91b6f9d1

推荐答案

searchIndentifier字段被映射为text数据类型,默认情况下将进行分析并使用标准分析器.使用Analyze API,您可以查看哪些术语将存储在1-00917751

The searchIndentifier field is mapped as a text datatype, which will undergo analysis and use the Standard Analyzer by default. Using the Analyze API, you can see what terms will be stored in the inverted index for 1-00917751

var client = new ElasticClient();

var analyzeResponse = client.Analyze(a => a
    .Text("1-00917751")
);

返回

{
  "tokens" : [
    {
      "token" : "1",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<NUM>",
      "position" : 0
    },
    {
      "token" : "00917751",
      "start_offset" : 2,
      "end_offset" : 10,
      "type" : "<NUM>",
      "position" : 1
    }
  ]
}

您将获得query_string查询与输入00917751的匹配项,因为它与存储在倒排索引中的一项匹配,这是对输入1-00917751进行索引时的分析结果.

You'll get a match for the query_string query with a query input of 00917751 as this matches one of the terms stored in the inverted index as a result of analysis at index time for the input 1-00917751.

您不会获得917751的匹配项,因为倒排索引中没有匹配的术语.您可以定义一个分析链,从数字中删除前导零,并保留原始令牌,例如

You won't get a match for 917751 as there is not a term in the inverted index that will match. You could define an analysis chain that removes leading zeroes from numbers as well as preserving the original token e.g.

private static void Main()
{
    var defaultIndex = "foobarbaz";
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));

    var settings = new ConnectionSettings(pool)
        .DefaultIndex(defaultIndex);

    var client = new ElasticClient(settings);

    client.CreateIndex(defaultIndex, c => c
        .Settings(s => s
            .Analysis(a => a
                .Analyzers(an => an
                    .Custom("trim_leading_zero", ca => ca
                        .Tokenizer("standard")
                        .Filters(
                            "standard", 
                            "lowercase", 
                            "trim_leading_zero",
                            "trim_zero_length")
                    )
                )
                .TokenFilters(tf => tf
                    .PatternReplace("trim_leading_zero", pr => pr
                        .Pattern("^0+(.*)")
                        .Replacement("$1")
                    )
                    .Length("trim_zero_length", t => t
                        .Min(1)
                    )
                )
            )
        )
        .Mappings(m => m
            .Map<MyDocument>(mm => mm
                .AutoMap()
                .Properties(p => p
                    .Text(t => t
                        .Name(n => n.SearchIndentifier)
                        .Analyzer("trim_leading_zero")
                        .Fields(f => f
                            .Keyword(k => k
                                .Name("keyword")
                                .IgnoreAbove(256)
                            )
                        )
                    )
                )
            )
        )
    );

    client.Index(new MyDocument { SearchIndentifier = "1-00917751" }, i => i
        .Refresh(Refresh.WaitFor)
    );

    client.Search<MyDocument>(s => s
        .Query(q => q
            .QueryString(qs => qs
                .Query("917751")
            )
        )
    );
}

public class MyDocument 
{
    public string SearchIndentifier { get; set; }
}

pattern_replacement令牌过滤器将修剪令牌中的前导零.

The pattern_replacement token filter will trim leading zeroes from tokens.

搜索查询返回索引文档

{
  "took" : 69,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.33310556,
    "hits" : [
      {
        "_index" : "foobarbaz",
        "_type" : "mydocument",
        "_id" : "MVF4bmYBJZHQAT-BUx1K",
        "_score" : 0.33310556,
        "_source" : {
          "searchIndentifier" : "1-00917751"
        }
      }
    ]
  }
}

这篇关于弹性搜索:查询字符串和数字并不总是返回所需结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆