ElasticSearch-定义自定义字母顺序进行排序 [英] ElasticSearch - define custom letter order for sorting
问题描述
我正在使用ElasticSearch 2.4.2(通过Java的HibernateSearch 5.7.1.Final).
I'm using ElasticSearch 2.4.2 (via HibernateSearch 5.7.1.Final from Java).
我对字符串排序有问题.
我的应用程序的语言带有变音符号,其中包含特定的字母
订购.例如,Ł
直接在L
之后,Ó
在O
之后,依此类推.
因此,您应该按以下方式对字符串进行排序:
I have a problem with string sorting.
The language of my application has diacritics, which have a specific alphabetic
ordering. For example Ł
goes directly after L
, Ó
goes after O
, etc.
So you are supposed to sort the strings like this:
Dla
Dła
Doa
Dóa
Dza
Eza
ElasticSearch首先按典型字母排序,然后移动所有奇怪的字符 末尾的字母:
ElasticSearch sorts by typical letters first, and moves all strange letters to at the end:
Dla
Doa
Dza
Dła
Dóa
Eza
我可以为ElasticSearch添加自定义字母顺序吗? 也许有一些插件吗? 我需要编写自己的插件吗?我该如何开始?
Can I add a custom letter ordering for ElasticSearch? Maybe there are some plugins for this? Do I need to write my own plugin? How do I start?
我找到了插件对于ElasticSearch的波兰语语言,
但据我了解,它是用于分析的,而分析不是解决方案
就我而言,因为它将忽略变音符号,并留下混有L
和Ł
的单词:
I found a plugin for Polish language for ElasticSearch,
but as I understand it is for analysing, and analysing is not a solution
in my case, because it will ignore diacritics and leave words with L
and Ł
mixed:
Dla
Dłb
Dlc
这有时是可以接受的,但在我的特定用例中是不可接受的.
This would sometimes be acceptable, but is not acceptable in my specific usecase.
对此,我将不胜感激.
推荐答案
I've never used it, but there is a plugin that could fit your needs: the ICU collation plugin.
您将必须使用icu_collation
令牌过滤器,该过滤器会将令牌转换为归类密钥.因此,您需要在Hibernate Search中使用单独的@Field
(例如myField_sort
).
You will have to use the icu_collation
token filter, which will turns the tokens into collation keys. For that reason you will need to use a separate @Field
(e.g. myField_sort
) in Hibernate Search.
您可以使用@Field(name = "myField_sort", analyzer = @Analyzer(definition = "myCollationAnalyzer"))
将特定的分析器分配给您的字段,然后使用其中一个实体上的类似名称来定义此分析器(类型,参数):
You can assign a specific analyzer to your field with @Field(name = "myField_sort", analyzer = @Analyzer(definition = "myCollationAnalyzer"))
, and define this analyzer (type, parameters) with something like that on one of your entities:
@Entity
@Indexed
@AnalyzerDef(
name = "myCollationAnalyzer",
filters = {
@TokenFilterDef(
name = "polish_collation",
factory = ElasticsearchTokenFilterFactory.class,
params = {
@Parameter(name = "type", value = "'icu_collation'"),
@Parameter(name = "language", value = "'pl'")
}
)
}
)
public class MyEntity {
有关更多信息,请参见文档: https ://docs.jboss.org/hibernate/stable/search/reference/zh-CN/html_single/#_custom_analyzers
See the documentation for more information: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_custom_analyzers
It's admittedly a bit clumsy right now, but analyzer configuration will get a bit cleaner in the next Hibernate Search version with normalizers and analyzer definition providers.
注意:通常,您的字段需要声明为可排序(@SortableField(forField = "myField_sort")
).
Note: as usual, your field will need to be declared as sortable (@SortableField(forField = "myField_sort")
).
这篇关于ElasticSearch-定义自定义字母顺序进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!