ElasticSearch聚合+在非数值字段5.3上排序 [英] ElasticSearch Aggregation + Sorting in on NonNumric Field 5.3

查看:200
本文介绍了ElasticSearch聚合+在非数值字段5.3上排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在另一个字段上聚合数据,还想根据名称以排序的方式获取聚合数据。

I wanted to aggregate the data on a different field and also wanted to get the aggregated data on sorted fashion based on the name.

我的数据是:

{
    "_index": "testing-aggregation",
    "_type": "employee",
    "_id": "emp001_local000000000000001",
    "_score": 10.0,
    "_source": {
        "name": [
            "Person 01"
        ],
        "groupbyid": [
            "group0001"
        ],
        "ranking": [
             "2.0"
        ]
    }
},
{
    "_index": "testing-aggregation",
    "_type": "employee",
    "_id": "emp002_local000000000000001",
    "_score": 85146.375,
    "_source": {
        "name": [
            "Person 02"
        ],
        "groupbyid": [
            "group0001"
        ],
        "ranking": [
             "10.0"
        ]
    }
},
{
    "_index": "testing-aggregation",
    "_type": "employee",
    "_id": "emp003_local000000000000001",
    "_score": 20.0,
    "_source": {
        "name": [
            "Person 03"
        ],
        "groupbyid": [
            "group0002"
        ],        
        "ranking": [
             "-1.0"
        ]
    }
},
{
    "_index": "testing-aggregation",
    "_type": "employee",
    "_id": "emp004_local000000000000001",
    "_score": 5.0,
    "_source": {
        "name": [
            "Person 04"
        ],
        "groupbyid": [
            "group0002"
        ],
        "ranking": [
             "2.0"
        ]
    }
}

我的查询:

{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "name:emp*^1000.0"
                    }
                }
            ]
        }
    },
    "aggs": {
        "groupbyid": {
            "terms": {
                "field": "groupbyid.raw",
                "order": {
                    "top_hit_agg": "desc"
                },
                "size": 10
            },
            "aggs": {
                "top_hit_agg": {
                    "terms": {
                        "field": "name"
                    }
                }
            }
        }
    }
}

我的映射是:

{
    "name": {
        "type": "text",
        "fielddata": true,
        "fields": {
            "lower_case_sort": {
                "type": "text",
                "fielddata": true,
                "analyzer": "case_insensitive_sort"
            }
        }
    },
    "groupbyid": {
        "type": "text",
        "fielddata": true,
        "index": "analyzed",
        "fields": {
            "raw": {
                "type": "keyword",
                "index": "not_analyzed"
            }
        }
    }
}

我正在根据分组记录的相关性平均值来获取数据。现在,我想要的是第一家根据groupid进行记录的俱乐部,然后在每个存储桶中根据名称字段对数据进行排序。

I am getting data based on the average of the relevance of grouped records. Now, what I wanted is the first club the records based on the groupid and then in each bucket sort the data based on the name field.

我想在一个字段上进行分组然后在该分组的存储桶之后,我想在另一个字段上进行排序。这是示例数据。

I wanted grouping on one field and after that grouped bucket, I want to sort on another field. This is sample data.

还有其他字段,例如created_on,updated_on。我还想获得基于该字段的排序数据。

There are other fields like created_on, updated_on. I also wanted to get sorted data based on that field. also get the data by alphabetically grouped.

我想对非数字数据类型(字符串)进行排序。我可以使用数字数据类型。

I wanted to sort on the non-numeric data type(string). I can do the numeric data type.

我可以在排名字段中使用它,但不能在名称字段中使用。它给出了以下错误。

I can do it for the ranking field but not able to do it for the name field. It was giving the below error.

Expected numeric type on field [name], but got [text]; 


推荐答案

您要的是一些东西,所以我

You're asking for a few things, so I'll try to answer them in turn.


我正在根据分组记录的相关性平均值获取数据。

I am getting data based on the average of the relevance of grouped records.

如果这是您要尝试的操作这样做,不是您编写的聚合正在执行的操作。术语汇总默认情况下按每个存储区中的文档数降序对存储区进行排序。要按平均相关性对组进行排序(我将其解释为组中文档的平均 _score ),您需要添加子聚合按分数排序:

If this is what you're attempting to do, it's not what the aggregation you wrote is doing. Terms aggregations default to sorting the buckets by the number of documents in each bucket, descending. To sort the groups by "average relevance" (which I'll interpret as "average _score of documents in the group"), you'd need to add a sub-aggregation on the score and sort the terms aggregation by that:

"aggregations": {
  "most_relevant_groups": {
    "terms": {
      "field": "groupbyid.raw",
      "order": {
        "average_score": "desc"
      }
    },
    "aggs": {
      "average_score": {
        "avg": {
          "script": {
            "inline": "_score",
            "lang": "painless",
          }
        }
      }
    }
  }
}



步骤2:按姓名对员工进行排序



Step 2: Sorting employees by name


现在,我想要的是第一个俱乐部基于groupid的记录,然后在每个存储桶中根据名称字段对数据进行排序。

Now, what I wanted is the first club the records based on the groupid and then in each bucket sort the data based on the name field.

要对每个存储桶中的文档进行排序,您可以可以使用 top_hits 聚合:

To sort the documents within each bucket, you can use a top_hits aggregation:

"aggregations": {
  "most_relevant_groups": {
    "terms": {
      "field": "groupbyid.raw",
      "order": {
        "average_score": "desc"
      }
    },
    "aggs": {
      "employees": {
        "top_hits": {
          "size": 10,  // Default will be 10 - change to whatever
          "sort": [
            {
              "name.lower_case_sort": {
                "order": "asc"
              }
            }
          ]
        }
      }
    }
  }
}



步骤3:将所有内容放在一起



将以上两者放在一起,以下汇总应适合您的需要(请注意,我使用了function_score查询来基于排名模拟相关性-您的查询可以是任意查询,而只要是能够产生所需相关性的查询即可):

Step 3: Putting it all together

Putting the both the above together, the following aggregation should suit your needs (note that I used a function_score query to simulate "relevance" based on ranking - your query can be whatever and just needs to be any query that produces whatever relevance you need):

POST /testing-aggregation/employee/_search
{
  "size": 0,
  "query": {
    "function_score": {
      "functions": [
        {
          "field_value_factor": {
            "field": "ranking"
          }
        }
      ]
    }
  },
  "aggs": {
    "groupbyid": {
      "terms": {
        "field": "groupbyid.raw",
        "size": 10,
        "order": {
          "average_score": "desc"
        }
      },
      "aggs": {
        "average_score": {
          "avg": {
            "script": {
              "inline": "_score",
              "lang": "painless"
            }
          }
        },
        "employees": {
          "top_hits": {
            "size": 10,
            "sort": [
              {
                "name.lower_case_sort": {
                  "order": "asc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

这篇关于ElasticSearch聚合+在非数值字段5.3上排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆