字段没有按字母顺序排列在弹性搜索中 [英] Fields not getting sorted in alphabetical order in elasticsearch

查看:187
本文介绍了字段没有按字母顺序排列在弹性搜索中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个文件,其中有一个名称字段。我正在使用搜索的名称字段的分析版本和 not_analyzed 进行排序。排序发生在一个级别,即名称首先按字母顺序排列。但是在字母表的列表中,这些名称按字典顺序排列,而不是按字母顺序排列。
这是我使用的映射:

  {
mappings:{
seing:{
properties:{
name:{
type:string,
fields:{
raw :{
type:string,
index:not_analyzed
}
}
}
}

}

任何人都可以提供相同的解决方案?

解决方案

下载到弹性搜索文件中,我偶然发现:





不区分大小写的排序



< blockquote>

想象一下,我们有三个用户文档,其名称字段包含
Boffey,BRO WN和贝利。首先,我们将应用String Sorting and Multifields中描述的
技术,使用
not_analyzed字段进行排序:




  PUT / my_index 
{
mappings:{
user:{
properties:{
名称:{// 1
type:string,
fields:{
raw:{// 2
type:string ,
index:not_analyzed
}
}
}
}
}
}
}




  1. 分析的 code> name 字段用于搜索。

  2. not_analyzed name.raw 字段用于排序。




上述搜索请求将按照以下顺序返回文件:
BROWN,Boffey,bailey。这被称为词汇顺序,因为
与字母顺序相反。本质上,用于
的字节表示大写字母的值比用于
的字节小于小写字母,因此名称首先以
最低字节排序。



这对于计算机来说可能是有意义的,但对于bbbb人来说,这并不太有意义,他们会合理地期望这些名称按字母顺序排列成
,而不管案件。为了实现这一点,我们需要以每个名称的形式来索引
,字符串排序对应于我们想要的
排序。



换句话说,我们需要一个分析器,它会发出一个小写的
令牌:


按照这个逻辑,文件中,您需要使用自定义关键字分析器进行小写:

  PUT / my_index 
{
设置:{
analysis:{
analyzer:{
case_insensitive_sort:{
tokenizer:keyword,
filter [smallcase]
}
}
}
},
mappings:{
seing:{
:{
name:{
type:string,
fields:{
raw:{
type string,
analyzer:case_insensitive_sort
}
}
}
}
}
}
}

现在通过 name.raw 的排序应按按字母排序顺序排序,而不是排序



使用Marvel在本地机器上快速测试:



索引结构:

  PUT / my_index 
{
settings:{
analysis:{
analyzer:{
case_insensitive_sort:{
tokenizer:keyword,
filter:[
smallcase
]
}
}

},
mappings:{
user:{
properties:{
name:{
type :string,
fields:{
raw:{
type:string,
index:not_analyzed
},
关键字:{
type:string,
analyzer:cas e_insensitive_sort
}
}
}
}
}
}
}
pre>

测试数据:

  PUT / my_index / user / 1 
{
name:Tim
}

PUT / my_index / user / 2
{
name TOM
}

使用原始字段查询:

  POST / my_index / user / _search 
{
sort:name.raw
}

结果:

  {
_index:my_index,
_type:user,
_id:2,
_score:null,
_source:{
name:TOM
},
sort:[
TOM
]
}
{
_index:my_index,
_type:user,
_id:1,
_score ,
_source:{
name:Tim
},
sort:[
Tim
]
}

使用较低的查询字符串:

  POST / my_index / user / _search 
{
sort:name.keyword
}

结果:

  {
_index:my_index,
_type:user,
_id:1,
_score:null,
_source:{
name:Tim
},
sort:[
tim
]
{
_index:my_index,
_type:user,
_id:2
_score:null,
_source:{
name:TOM
},
sort:[
tom
]
}

我怀疑第二个结果在你的情况下是正确的。


I have a few documents with the a name field in it. I am using analyzed version of the name field for search and not_analyzed for sorting purposes. The sorting happens in one level, that is the names are sorted alphabetically at first. But within the list of an alphabet, the names are getting sorted lexicographically rather than alphabetically. Here is the mapping I have used:

{
  "mappings": {
    "seing": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }

Can anyone provide a solution for the same?

解决方案

Digging down into Elasticsearch documents, I stumbled upon this:

Case-Insensitive Sorting

Imagine that we have three user documents whose name fields contain Boffey, BROWN, and bailey, respectively. First we will apply the technique described in String Sorting and Multifields of using a not_analyzed field for sorting:

PUT /my_index
{
  "mappings": {
    "user": {
      "properties": {
        "name": {                    //1
          "type": "string",
          "fields": {
            "raw": {                 //2
              "type":  "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

  1. The analyzed name field is used for search.
  2. The not_analyzed name.raw field is used for sorting.

The preceding search request would return the documents in this order: BROWN, Boffey, bailey. This is known as lexicographical order as opposed to alphabetical order. Essentially, the bytes used to represent capital letters have a lower value than the bytes used to represent lowercase letters, and so the names are sorted with the lowest bytes first.

That may make sense to a computer, but doesn’t make much sense to human beings who would reasonably expect these names to be sorted alphabetically, regardless of case. To achieve this, we need to index each name in a way that the byte ordering corresponds to the sort order that we want.

In other words, we need an analyzer that will emit a single lowercase token:

Following this logic, instead of storing raw document, you need to lowercase it using custom keyword analyzer:

PUT /my_index
{
  "settings" : {
    "analysis" : {
      "analyzer" : {
        "case_insensitive_sort" : {
          "tokenizer" : "keyword",
          "filter" : ["lowercase"]
        }
      }
    }
  },
  "mappings" : {
    "seing" : {
      "properties" : {
        "name" : {
          "type" : "string",
          "fields" : {
            "raw" : {
              "type" : "string",
              "analyzer" : "case_insensitive_sort"
            }
          }
        }
      }
    }
  }
}

Now ordering by name.raw should sort in alphabetical order, rather than lexicographical.

Quick test done on my local machine using Marvel:

Index structure:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "case_insensitive_sort": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "user": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            },
            "keyword": {
              "type": "string",
              "analyzer": "case_insensitive_sort"
            }
          }
        }
      }
    }
  }
}

Test data:

PUT /my_index/user/1
{
  "name": "Tim"
}

PUT /my_index/user/2
{
  "name": "TOM"
}

Query using raw field:

POST /my_index/user/_search
{
  "sort": "name.raw"
}

Result:

{
  "_index" : "my_index",
  "_type" : "user",
  "_id" : "2",
  "_score" : null,
  "_source" : {
    "name" : "TOM"
  },
  "sort" : [
    "TOM"
  ]
},
{
  "_index" : "my_index",
  "_type" : "user",
  "_id" : "1",
  "_score" : null,
  "_source" : {
    "name" : "Tim"
  },
  "sort" : [
    "Tim"
  ]
}

Query using lowercased string:

POST /my_index/user/_search
{
  "sort": "name.keyword"
}

Result:

{
  "_index" : "my_index",
  "_type" : "user",
  "_id" : "1",
  "_score" : null,
  "_source" : {
    "name" : "Tim"
  },
  "sort" : [
    "tim"
  ]
},
{
  "_index" : "my_index",
  "_type" : "user",
  "_id" : "2",
  "_score" : null,
  "_source" : {
    "name" : "TOM"
  },
  "sort" : [
    "tom"
  ]
}

I'm suspecting that second result is correct in your case.

这篇关于字段没有按字母顺序排列在弹性搜索中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆