如何使用nest c#客户端在弹性搜索中进行重音不敏感搜索? [英] How to make an accent insensitive search in elasticsearch with nest c# client?

查看:178
本文介绍了如何使用nest c#客户端在弹性搜索中进行重音不敏感搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我们有一个这样的课程:

  public class A 
{
public string name;
}

我们有2个文件,名称如strongAyşe / strong>和Ayse



现在,我想能够使用他们的口音存储名称,但是当我搜索希望能够将重音不敏感查询的结果作为重音敏感结果



例如:当我搜索AyseAyşe,它应该返回他们存储(带有口音)的Ayşe和Ayse



现在,当我搜索Ayse时,它只返回Ayse,但是我也想要Ayşe。



当我检查弹性搜索文档,我看到折叠的属性需要用于这一点。但是我不明白如何使用Nest属性/函数。



BTW我正在使用AutoMap创建映射,如果可能,我想可以继续使用它。



我正在寻找2天的答案,目前还无法确定。



需要哪些变更?你可以提供代码示例吗?



谢谢。



编辑1: / strong>



我想出了如何使用分析器创建属性的子字段,并使用基于字段的子字段查询结果。



现在,我知道我可以做一个多字段搜索,但是是否有一种方法可以将子字段包含全文搜索?



谢谢。

解决方案

您可以配置分析器对索引时的文本进行分析,将其索引到 multi_field 在查询时使用,并保留原始来源返回结果。根据您的问题,您可能希望使用自定义分析器,使用
$ b。 $ b

给出以下文档

  public class Document 
{
public int Id {get ; set;}
public string Name {get;组; }
}

创建索引时,可以设置自定义分析器;我们也可以同时指定映射

  client.CreateIndex(documentsIndex,ci => ci 
。设置(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
.Analysis(analysis => analysis
.TokenFilters(tokenfilters => tokenfilters
.AsciiFolding(folding-preserve,ft => ft
.PreserveOriginal()


。分析器(analyzerers =>分析器
。自定义(折叠分析器,c => c
.Tokenizer(standard)
.Filters(standard,folding-preserve)




.Mappings(m => m
.Map< Document>(mm => mm
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => nN ame)
.Fields(f => f
.String(ss => ss
.Name(folding)
.Analyzer(折叠分析器)


。不分析()




);

这里我创建了一个分片,没有副本(您可能想要更改此您的环境),并创建了一个自定义分析器,折叠分析器,它使用标准的标记器与标准令牌过滤器和执行ascii折叠的折叠保存令牌过滤器,除了折叠令牌之外还存储原始令牌(更多关于为什么这在一分钟内可能有用)。



我还映射了文档类型,映射名称 c $ c> multi_field ,默认字段 not_analyzed (对聚合有用)和 ,将使用折叠分析器进行分析。原始的源文件也将由Elasticsearch默认存储。



现在我们来索引一些文档

  client.Index< Document>(new Document {Id = 1,Name =Ayse}); 
client.Index< Document>(new Document {Id = 2,Name =Ayşe});

//索引后刷新索引,以确保索引的文档是
//可以被搜索
client.Refresh(documentsIndex);

最后,搜索Ayşe

  var response = client.Search< Document>(s => s 
.Query(q => q
.QueryString(qs => qs
.Fields(f => f
.Field(c => c.Name.Suffix(folding))

.Query Ayşe)


);

  {
take:2,
timed_out:false,
_shards:{
total:1,
:1,
failed:0
},
hits:{
total:2,
max_score:1.163388,
hits:[{
_index:documents,
_type:document,
_id:2,
_score :1.163388,
_source:{
id:2,
name:Ayşe
}
},{
_index:document,
_type:document,
_id:1,
_score:0.3038296,
_source {
id:1,
name:Ayse
}
}]
}
}

这里要强调的两件事:



首先, _source 包含发送到Elasticsearch的原始文本,因此使用 response.Documents ,您将获得原始名称,例如

  string.Join(,,response.Documents。选择(d => d.Name)); 

会给你Ayşe,Ayse



其次,请记住,我们在原始令牌过滤器中保留了原始令牌?这样做意味着我们可以执行经历分析的查询,以轻松匹配重音,但在评分时也要考虑到口音敏感度;在上面的示例中,匹配AyşeAyşe的分数高于Ayse 匹配Ayşe,因为令牌Ayşe Ayse 为前者编制索引,而只为后者编制Ayse 。当对 Name 属性执行进行分析的查询时,使用折叠分析器分析查询,搜索匹配进行

 索引时间
----------

文件1名称:Ayse --analysis - > Ayse

文件2名称:Ayşe--analysis - > Ayşe,Ayse


查询时间
-----------

query_string查询输入:Ayşe--analysis-- > Ayşe,Ayse

搜索带有名称字段匹配的令牌的文档Ayşe或Ayse


I'm an elasticsearch newbie.

Lets say we have a class like this:

public class A
{
    public string name;
}

And we have 2 documents which have names like "Ayşe" and "Ayse".

Now, I want to be able to store names with their accents but when I search want to be able take results of accent insensitive query as accent sensitive results.

For ex: When I search for "Ayse" or "Ayşe", it should return both "Ayşe" and "Ayse" as they stored (with accent).

Right now when I search for "Ayse" it only returns "Ayse" but I want to have "Ayşe" as a result too.

When I checked elasticsearch documentation, I see that folded properties is needed to be used to achive that. But I couldn't understand how to do it with Nest attributes / functions.

BTW I'm using AutoMap to create mappings right now and if it is possible I want to be able to continue to use it.

I'm searching for an answer for 2 days right now and couldn't figure it out yet.

What/where changes are required? Can you provide me code sample(s)?

Thank you.

EDIT 1:

I figured out how to use analyzers to create sub fields of a property and achive results with term based query against sub fields.

Now, I know I can do a multi field search but is there a way to include sub fields with full text search?

Thank you.

解决方案

You can configure an analyzer to perform analysis on the text at index time, index this into a multi_field to use at query time, as well as keep the original source to return in the result. Based on what you have in your question, it sounds like you want a custom analyzer that uses the asciifolding token filter to convert to ASCII characters at index and search time.

Given the following document

public class Document
{
    public int Id { get; set;}
    public string Name { get; set; }
}

Setting up a custom analyzer can be done when an index is created; we can also specify the mapping at the same time

client.CreateIndex(documentsIndex, ci => ci
    .Settings(s => s
        .NumberOfShards(1)
        .NumberOfReplicas(0)
        .Analysis(analysis => analysis
            .TokenFilters(tokenfilters => tokenfilters
                .AsciiFolding("folding-preserve", ft => ft
                    .PreserveOriginal()
                )
            )
            .Analyzers(analyzers => analyzers
                .Custom("folding-analyzer", c => c
                    .Tokenizer("standard")
                    .Filters("standard", "folding-preserve")
                )
            )
        )
    )
    .Mappings(m => m
        .Map<Document>(mm => mm
            .AutoMap()
            .Properties(p => p
                .String(s => s
                    .Name(n => n.Name)
                    .Fields(f => f
                        .String(ss => ss
                            .Name("folding")
                            .Analyzer("folding-analyzer")
                        )
                    )
                    .NotAnalyzed()
                )
            )
        )
    )
);

Here I've created an index with one shard and no replicas (you may want to change this for your environment), and have created a custom analyzer, folding-analyzer that uses the standard tokenizer in conjunction with the standard token filter and a folding-preserve token filter that perform ascii folding, storing the original tokens in addition to the folded tokens (more on why this may be useful in a minute).

I've also mapped the Document type, mapping the Name property as a multi_field, with default field not_analyzed (useful for aggregations) and a .folding sub-field that will be analyzed with the folding-analyzer. The original source document will also be stored by Elasticsearch by default.

Now let's index some documents

client.Index<Document>(new Document { Id = 1, Name = "Ayse" });
client.Index<Document>(new Document { Id = 2, Name = "Ayşe" });

// refresh the index after indexing to ensure the documents just indexed are
// available to be searched
client.Refresh(documentsIndex);

Finally, searching for Ayşe

var response = client.Search<Document>(s => s
    .Query(q => q
        .QueryString(qs => qs
            .Fields(f => f
                .Field(c => c.Name.Suffix("folding"))
            )
            .Query("Ayşe")
        )
    )
);

yields

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.163388,
    "hits" : [ {
      "_index" : "documents",
      "_type" : "document",
      "_id" : "2",
      "_score" : 1.163388,
      "_source" : {
        "id" : 2,
        "name" : "Ayşe"
      }
    }, {
      "_index" : "documents",
      "_type" : "document",
      "_id" : "1",
      "_score" : 0.3038296,
      "_source" : {
        "id" : 1,
        "name" : "Ayse"
      }
    } ]
  }
}

Two things to highlight here:

Firstly, the _source contains the original text that was sent to Elasticsearch so by using response.Documents, you will get the original names, for example

string.Join(",", response.Documents.Select(d => d.Name));

would give you "Ayşe,Ayse"

Secondly, remember that we preserved the original tokens in the asciifolding token filter? Doing so means that we can perform queries that undergo analysis to match accent insensitively but also take into account accent sensitivity when it comes to scoring; in the example above, the score for Ayşe matching Ayşe is higher than for Ayse matching Ayşe because the tokens Ayşe and Ayse are indexed for the former whilst only Ayse is indexed for the latter. When a query that undergoes analysis is performed against the Name property, the query is analyzed with the folding-analyzer and a search for matches is performed

Index time
----------

document 1 name: Ayse --analysis--> Ayse

document 2 name: Ayşe --analysis--> Ayşe, Ayse  


Query time
-----------

query_string query input: Ayşe --analysis--> Ayşe, Ayse

search for documents with tokens for name field matching Ayşe or Ayse 

这篇关于如何使用nest c#客户端在弹性搜索中进行重音不敏感搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆