如何对Elasticsearch中的分析/标记化字段进行排序? [英] How to sort on analyzed/tokenized field in Elasticsearch?
问题描述
我们正在我们的索引中存储一个标题
字段,并希望将该字段用于两个目的:
- 我们正在使用ngram过滤器进行分析,因此我们可以提供自动完成和即时结果
- 我们希望能够使用ASC排序在
title
字段而不是分数。
索引/过滤器/分析器被定义为所以:
array(
'number_of_shards'=> $ this-> shards,
' number_of_replicas'=> $ this-> replicas,
'analysis'=>数组(
'filter'=>数组(
'nGram_filter'=>数组$ b'type'=>'nGram',
'min_gram'=> 2,
'max_gram'=> 20,
'token_chars'=>数组','digit','punctuation','symbol')
)
),
'an alyzer'=>数组(
'index_analyzer'=>数组(
'type'=> 'custom',
'tokenizer'=>'whitespace',
'char_filter'=> 'html_strip',
'filter'=> array('smallcase','asciifolding','nGram_filter')
),
'search_analyzer'=>数组(
'type'=>'custom',
'tokenizer'=>'空格',
'char_filter'=>'html_strip',
' '=>数组('小写','asciifolding')
)
)
)
),
当我们在标题
字段排序时,我们遇到的问题是不可预测的结果。经过一番搜索后,我们在ElasticSearch ...( sort 手册页的末尾找到了这个/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerationsrel =nofollow> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request -sort.html#_memory_considerations )
对于基于字符串的类型,排序的字段不应该被分析/标记化。 / p>
我们如何能够稍后分析字段和排序?为了排序,我们是否需要使用 not_analyzed
存储两次字段?由于字段 _source
还将标题
的值存储在原始状态中,那么不能用于排序?
您可以使用内置的多字段类型。
multi_field类型允许映射几个相同值的core_types。这可以非常方便,例如,当想要映射字符串类型时,一旦分析了它,并且一旦它被not_analyzed。
在弹性搜索引用中,请查看字符串排序和多字段有关如何设置所需内容的指南。
<请注意,弹性搜索0.90.X和1.X之间的多字段映射配置已更改。根据您的版本使用适当的以下指南:
We're storing a title
field in our index and want to use the field for two purposes:
- We're analyzing with an ngram filter so we can provide autocomplete and instant results
- We want to be able to list results using an ASC sort on the
title
field rather than score.
The index/filter/analyzer is defined like so:
array(
'number_of_shards' => $this->shards,
'number_of_replicas' => $this->replicas,
'analysis' => array(
'filter' => array(
'nGram_filter' => array(
'type' => 'nGram',
'min_gram' => 2,
'max_gram' => 20,
'token_chars' => array('letter','digit','punctuation','symbol')
)
),
'analyzer' => array(
'index_analyzer' => array(
'type' => 'custom',
'tokenizer' =>'whitespace',
'char_filter' => 'html_strip',
'filter' => array('lowercase','asciifolding','nGram_filter')
),
'search_analyzer' => array(
'type' => 'custom',
'tokenizer' =>'whitespace',
'char_filter' => 'html_strip',
'filter' => array('lowercase','asciifolding')
)
)
)
),
The problem we're experiencing is unpredictable results when we Sort on the title
field. After doing a little searching, we found this at the end of the sort
man page at ElasticSearch... (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerations)
For string based types, the field sorted on should not be analyzed / tokenized.
How can we both analyze the field and sort on it later? Do we need to store the field twice with one using not_analyzed
in order to sort? Since the field _source
is also storing the title
value in it's original state, can that not be used to sort on?
You can use the built in concept of Multi Field Type in Elasticsearch.
The multi_field type allows to map several core_types of the same value. This can come very handy, for example, when wanting to map a string type, once when it’s analyzed and once when it’s not_analyzed.
In the Elasticsearch Reference, please look at the String Sorting and Multi Fields guide on how to setup what you need.
Please note that Multi Field mapping configuration has changed between Elasticsearch 0.90.X and 1.X. Use the appropriate following guide based on your version:
这篇关于如何对Elasticsearch中的分析/标记化字段进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!