如何对Elasticsearch中的分析/标记化字段进行排序? [英] How to sort on analyzed/tokenized field in Elasticsearch?

查看:310
本文介绍了如何对Elasticsearch中的分析/标记化字段进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在我们的索引中存储一个标题字段,并希望将该字段用于两个目的:


  1. 我们正在使用ngram过滤器进行分析,因此我们可以提供自动完成和即时结果

  2. 我们希望能够使用ASC排序在 title 字段而不是分数。

索引/过滤器/分析器被定义为所以:

  array(
'number_of_shards'=> $ this-> shards,
' number_of_replicas'=> $ this-> replicas,
'analysis'=>数组(
'filter'=>数组(
'nGram_filter'=>数组$ b'type'=>'nGram',
'min_gram'=> 2,
'max_gram'=> 20,
'token_chars'=>数组','digit','punctuation','symbol')

),

'an alyzer'=>数组(
'index_analyzer'=>数组(
'type'=> 'custom',
'tokenizer'=>'whitespace',
'char_filter'=> 'html_strip',
'filter'=> array('smallcase','asciifolding','nGram_filter')
),
'search_analyzer'=>数组(
'type'=>'custom',
'tokenizer'=>'空格',
'char_filter'=>'html_strip',
' '=>数组('小写','asciifolding')



),

当我们在标题字段排序时,我们遇到的问题是不可预测的结果。经过一番搜索后,我们在ElasticSearch ...( sort 手册页的末尾找到了这个/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerationsrel =nofollow> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request -sort.html#_memory_considerations


对于基于字符串的类型,排序的字段不应该被分析/标记化。 / p>

我们如何能够稍后分析字段和排序?为了排序,我们是否需要使用 not_analyzed 存储两次字段?由于字段 _source 还将标题的值存储在原始状态中,那么不能用于排序?

解决方案

您可以使用内置的多字段类型


multi_field类型允许映射几个相同值的core_types。这可以非常方便,例如,当想要映射字符串类型时,一旦分析了它,并且一旦它被not_analyzed。


在弹性搜索引用中,请查看字符串排序和多字段有关如何设置所需内容的指南。



<请注意,弹性搜索0.90.X和1.X之间的多字段映射配置已更改。根据您的版本使用适当的以下指南:




We're storing a title field in our index and want to use the field for two purposes:

  1. We're analyzing with an ngram filter so we can provide autocomplete and instant results
  2. We want to be able to list results using an ASC sort on the title field rather than score.

The index/filter/analyzer is defined like so:

array(
    'number_of_shards' => $this->shards,
    'number_of_replicas' => $this->replicas,
    'analysis' => array(
        'filter' => array(
            'nGram_filter' => array(
                'type' => 'nGram',
                'min_gram' => 2,
                'max_gram' => 20,
                'token_chars' => array('letter','digit','punctuation','symbol')
            )
        ),

        'analyzer' => array(
            'index_analyzer' => array(
                'type' => 'custom',
                'tokenizer' =>'whitespace',
                'char_filter' => 'html_strip',
                'filter' => array('lowercase','asciifolding','nGram_filter')
            ),
            'search_analyzer' => array(
                'type' => 'custom',
                'tokenizer' =>'whitespace',
                'char_filter' => 'html_strip',
                'filter' => array('lowercase','asciifolding')
            )
        )
    )
),

The problem we're experiencing is unpredictable results when we Sort on the title field. After doing a little searching, we found this at the end of the sort man page at ElasticSearch... (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerations)

For string based types, the field sorted on should not be analyzed / tokenized.

How can we both analyze the field and sort on it later? Do we need to store the field twice with one using not_analyzed in order to sort? Since the field _source is also storing the title value in it's original state, can that not be used to sort on?

解决方案

You can use the built in concept of Multi Field Type in Elasticsearch.

The multi_field type allows to map several core_types of the same value. This can come very handy, for example, when wanting to map a string type, once when it’s analyzed and once when it’s not_analyzed.

In the Elasticsearch Reference, please look at the String Sorting and Multi Fields guide on how to setup what you need.

Please note that Multi Field mapping configuration has changed between Elasticsearch 0.90.X and 1.X. Use the appropriate following guide based on your version:

这篇关于如何对Elasticsearch中的分析/标记化字段进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆