如何使用nest c#客户端在弹性搜索中进行重音不敏感搜索? [英] How to make an accent insensitive search in elasticsearch with nest c# client?
问题描述
我们有一个这样的课程:
public class A
{
public string name;
}
我们有2个文件,名称如strongAyşe / strong>和Ayse。
现在,我想能够使用他们的口音存储名称,但是当我搜索希望能够将重音不敏感查询的结果作为重音敏感结果。
例如:当我搜索Ayse或Ayşe,它应该返回他们存储(带有口音)的Ayşe和Ayse。
现在,当我搜索Ayse时,它只返回Ayse,但是我也想要Ayşe。
当我检查弹性搜索文档,我看到折叠的属性需要用于这一点。但是我不明白如何使用Nest属性/函数。
BTW我正在使用AutoMap创建映射,如果可能,我想可以继续使用它。
我正在寻找2天的答案,目前还无法确定。
需要哪些变更?你可以提供代码示例吗?
谢谢。
编辑1: / strong>
我想出了如何使用分析器创建属性的子字段,并使用基于字段的子字段查询结果。
现在,我知道我可以做一个多字段搜索,但是是否有一种方法可以将子字段包含全文搜索?
谢谢。
您可以配置分析器对索引时的文本进行分析,将其索引到 multi_field 在查询时使用,并保留原始来源返回结果。根据您的问题,您可能希望使用自定义分析器,使用 给出以下文档 创建索引时,可以设置自定义分析器;我们也可以同时指定映射 这里我创建了一个分片,没有副本(您可能想要更改此您的环境),并创建了一个自定义分析器, 我还映射了 现在我们来索引一些文档 最后,搜索Ayşe 这里要强调的两件事: 首先, 会给你Ayşe,Ayse 其次,请记住,我们在原始令牌过滤器中保留了原始令牌?这样做意味着我们可以执行经历分析的查询,以轻松匹配重音,但在评分时也要考虑到口音敏感度;在上面的示例中,匹配Ayşe的Ayşe的分数高于Ayse 匹配Ayşe,因为令牌Ayşe和 Ayse 为前者编制索引,而只为后者编制Ayse 。当对 I'm an elasticsearch newbie. Lets say we have a class like this: And we have 2 documents which have names like "Ayşe" and "Ayse". Now, I want to be able to store names with their accents but when I search want to be able take results of accent insensitive query as accent sensitive results. For ex: When I search for "Ayse" or "Ayşe", it should return both "Ayşe" and "Ayse" as they stored (with accent). Right now when I search for "Ayse" it only returns "Ayse" but I want to have "Ayşe" as a result too. When I checked elasticsearch documentation, I see that folded properties is needed to be used to achive that. But I couldn't understand how to do it with Nest attributes / functions. BTW I'm using AutoMap to create mappings right now and if it is possible I want to be able to continue to use it. I'm searching for an answer for 2 days right now and couldn't figure it out yet. What/where changes are required? Can you provide me code sample(s)? Thank you. EDIT 1: I figured out how to use analyzers to create sub fields of a property and achive results with term based query against sub fields. Now, I know I can do a multi field search but is there a way to include sub fields with full text search? Thank you. You can configure an analyzer to perform analysis on the text at index time, index this into a multi_field to use at query time, as well as keep the original source to return in the result. Based on what you have in your question, it sounds like you want a custom analyzer that uses the Given the following document Setting up a custom analyzer can be done when an index is created; we can also specify the mapping at the same time Here I've created an index with one shard and no replicas (you may want to change this for your environment), and have created a custom analyzer, I've also mapped the Now let's index some documents Finally, searching for Ayşe yields Two things to highlight here: Firstly, the would give you "Ayşe,Ayse" Secondly, remember that we preserved the original tokens in the asciifolding token filter? Doing so means that we can perform queries that undergo analysis to match accent insensitively but also take into account accent sensitivity when it comes to scoring; in the example above, the score for Ayşe matching Ayşe is higher than for Ayse matching Ayşe because the tokens Ayşe and Ayse are indexed for the former whilst only Ayse is indexed for the latter. When a query that undergoes analysis is performed against the
这篇关于如何使用nest c#客户端在弹性搜索中进行重音不敏感搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
$ b。 $ b
public class Document
{
public int Id {get ; set;}
public string Name {get;组; }
}
client.CreateIndex(documentsIndex,ci => ci
。设置(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
.Analysis(analysis => analysis
.TokenFilters(tokenfilters => tokenfilters
.AsciiFolding(folding-preserve,ft => ft
.PreserveOriginal()
)
)
。分析器(analyzerers =>分析器
。自定义(折叠分析器,c => c
.Tokenizer(standard)
.Filters(standard,folding-preserve)
)
)
)
)
.Mappings(m => m
.Map< Document>(mm => mm
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => nN ame)
.Fields(f => f
.String(ss => ss
.Name(folding)
.Analyzer(折叠分析器)
)
)
。不分析()
)
)
)
)
);
折叠分析器
,它使用标准的标记器与标准
令牌过滤器和执行ascii折叠的折叠保存
令牌过滤器,除了折叠令牌之外还存储原始令牌(更多关于为什么这在一分钟内可能有用)。
文档
类型,映射名称
c $ c> multi_field ,默认字段 not_analyzed
(对聚合有用)和 ,将使用
折叠分析器
进行分析。原始的源文件也将由Elasticsearch默认存储。
client.Index< Document>(new Document {Id = 1,Name =Ayse});
client.Index< Document>(new Document {Id = 2,Name =Ayşe});
//索引后刷新索引,以确保索引的文档是
//可以被搜索
client.Refresh(documentsIndex);
var response = client.Search< Document>(s => s
.Query(q => q
.QueryString(qs => qs
.Fields(f => f
.Field(c => c.Name.Suffix(folding))
)
.Query Ayşe)
)
)
);
{
take:2,
timed_out:false,
_shards:{
total:1,
:1,
failed:0
},
hits:{
total:2,
max_score:1.163388,
hits:[{
_index:documents,
_type:document,
_id:2,
_score :1.163388,
_source:{
id:2,
name:Ayşe
}
},{
_index:document,
_type:document,
_id:1,
_score:0.3038296,
_source {
id:1,
name:Ayse
}
}]
}
}
_source
包含发送到Elasticsearch的原始文本,因此使用 response.Documents
,您将获得原始名称,例如
string.Join(,,response.Documents。选择(d => d.Name));
Name
属性执行进行分析的查询时,使用折叠分析器
分析查询,搜索匹配进行
索引时间
----------
文件1名称:Ayse --analysis - > Ayse
文件2名称:Ayşe--analysis - > Ayşe,Ayse
查询时间
-----------
query_string查询输入:Ayşe--analysis-- > Ayşe,Ayse
搜索带有名称字段匹配的令牌的文档Ayşe或Ayse
public class A
{
public string name;
}
asciifolding
token filter to convert to ASCII characters at index and search time.public class Document
{
public int Id { get; set;}
public string Name { get; set; }
}
client.CreateIndex(documentsIndex, ci => ci
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
.Analysis(analysis => analysis
.TokenFilters(tokenfilters => tokenfilters
.AsciiFolding("folding-preserve", ft => ft
.PreserveOriginal()
)
)
.Analyzers(analyzers => analyzers
.Custom("folding-analyzer", c => c
.Tokenizer("standard")
.Filters("standard", "folding-preserve")
)
)
)
)
.Mappings(m => m
.Map<Document>(mm => mm
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.Name)
.Fields(f => f
.String(ss => ss
.Name("folding")
.Analyzer("folding-analyzer")
)
)
.NotAnalyzed()
)
)
)
)
);
folding-analyzer
that uses the standard tokenizer in conjunction with the standard
token filter and a folding-preserve
token filter that perform ascii folding, storing the original tokens in addition to the folded tokens (more on why this may be useful in a minute).Document
type, mapping the Name
property as a multi_field
, with default field not_analyzed
(useful for aggregations) and a .folding
sub-field that will be analyzed with the folding-analyzer
. The original source document will also be stored by Elasticsearch by default.client.Index<Document>(new Document { Id = 1, Name = "Ayse" });
client.Index<Document>(new Document { Id = 2, Name = "Ayşe" });
// refresh the index after indexing to ensure the documents just indexed are
// available to be searched
client.Refresh(documentsIndex);
var response = client.Search<Document>(s => s
.Query(q => q
.QueryString(qs => qs
.Fields(f => f
.Field(c => c.Name.Suffix("folding"))
)
.Query("Ayşe")
)
)
);
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.163388,
"hits" : [ {
"_index" : "documents",
"_type" : "document",
"_id" : "2",
"_score" : 1.163388,
"_source" : {
"id" : 2,
"name" : "Ayşe"
}
}, {
"_index" : "documents",
"_type" : "document",
"_id" : "1",
"_score" : 0.3038296,
"_source" : {
"id" : 1,
"name" : "Ayse"
}
} ]
}
}
_source
contains the original text that was sent to Elasticsearch so by using response.Documents
, you will get the original names, for examplestring.Join(",", response.Documents.Select(d => d.Name));
Name
property, the query is analyzed with the folding-analyzer
and a search for matches is performedIndex time
----------
document 1 name: Ayse --analysis--> Ayse
document 2 name: Ayşe --analysis--> Ayşe, Ayse
Query time
-----------
query_string query input: Ayşe --analysis--> Ayşe, Ayse
search for documents with tokens for name field matching Ayşe or Ayse