基于脚本的 Elasticsearch 日期字段排序 [英] Script-based sorting on Elasticsearch date field

查看:50
本文介绍了基于脚本的 Elasticsearch 日期字段排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用 Elasticsearch,想对映射为 date、格式 hour_minute 的字段使用基于脚本的排序.每个文档中可以有多个字段实例.

I am just getting started with Elasticsearch and would like to use script-based sorting on a field that is mapped as date, format hour_minute. There can be multiple instances of the field in each document.

在介绍表达式之前,作为第一步,我尝试了一个简单的排序(使用 Sense 插件):

Before introducing expressions, as a first step I'm trying a simple sort (using the Sense plugin):

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "_script": {
         "script": "doc["someTime"].value",
         "lang": "groovy",
         "type": "date",
         "order": "asc"
      }
   }
}

我收到此错误(片段):

I get this error (fragment):

SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;
shardFailures {[tjWL-zV5QXmGjNlXzLvrzw][myIndex][0]:
SearchParseException[[myIndex][0]: 
query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source…

如果我使用 "type": "number" 发布上述查询,则没有错误,尽管这当然不会按日期排序.以下工作正常:

If I post the above query with "type": "number" there is no error, although this of course doesn't sort by date. The following works fine:

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "someTime": {
         "order": "asc"
      }
   }
}

最终我想使用基于脚本的排序,因为我将尝试使用日期和时间条件进行查询、过滤或排序,例如查询具有今天日期的文档,然后按日期之后的最低时间对它们进行排序现在时间等

Ultimately I'd like to use script-based sorting since I will be trying to query, filter or sort using date and time conditions, like query for documents with today’s date, then sort them by the lowest time that is after the time now, etc.

如有任何建议,我们将不胜感激.

Any suggestions would be much appreciated.

推荐答案

使用脚本对文档进行排序并不是真正的高效,尤其是当您的文档库预计会随着时间的推移而增长时.因此,我将为此提供一个解决方案,然后提出另一种选择.

Using scripts to sort documents is not really performant, especially if your document base is expected to grow over time. So I'm going to offer a solution for doing that and then suggest another option.

为了使用脚本进行排序,您需要将日期转换为毫秒,以便您的排序可以在一个简单的数字上运行(排序类型只能是 numberstring).

In order to sort using script, You need to transform your date into milliseconds so your sort can be run on a simple number (sort type can only be number or string).

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "_script": {
         "script": "doc["someTime"].date.getMillisOfDay()",
         "lang": "groovy",
         "type": "number",       <----- make sure this is number
         "order": "asc"
      }
   }
}

请注意,根据您想要的粒度,您还可以使用 getSecondOfDay()getMinuteOfDay().这样,如果您的查询和过滤器选择了正确日期的文档,您的排序脚本将根据当天的毫秒(或秒或分钟)数对文档进行排序.

Note that depending on the granularity you want, you can also use getSecondOfDay() or getMinuteOfDay(). That way, provided your queries and filters have selected documents for the right day, your sort script will sort documents based on the number of milliseconds (or seconds or minutes) within that day.

第二种解决方案意味着还将自那天开始以来的毫秒数(或秒或分钟)索引到另一个字段中,并简单地使用它进行排序,这样您就不需要脚本了.最重要的是,无论您在搜索时需要哪些在索引时已知的信息,都应该对其进行索引,而不是实时计算.

The second solution would imply to also index the number of milliseconds (or seconds or minutes) since the beginning of that day into another field and simply use it to sort, so that you don't need script. The bottom line is that whatever information you need at search time that can be known at index time should be indexed instead of computed in real-time.

例如,如果您的 someTime 字段包含日期 2015-10-05T05:34:12.276Z 那么您将索引 millisOfDay 值为 20052276 的字段,即

For instance, if your someTime field contains the date 2015-10-05T05:34:12.276Z then you'd index the millisOfDay field with the value 20052276, which is

  • 5 小时 * 3600000 毫秒
  • +34 分钟 * 60000 毫秒
  • +12 秒 * 1000 毫秒
  • +276 毫秒

然后你可以使用

POST myIndex/_search
{
   "query": {
      "range": {
          "someTime": {
              "gt": "now"
          }
      }
   },
   "sort": {
      "millisOfDay": {
         "order": "asc"
      }
   }
}

请注意,我添加了一个查询以仅选择 someTime 日期在现在之后的文档,因此您将来会获得所有文档,但按升序 millisOfDay,这意味着您将首先从 now 获得最近的日期.

Note that I've added a query to select only the documents whose someTime date is after now, so you'll get all documents in the future, but sorted by ascending millisOfDay, which means you'll get the nearest date from now first.

更新

如果 someTime 的格式为 HH:mm,那么您还可以存储它的 millisOfDay 值,例如如果 someTime = 17:30 那么 millisOfDay 将是 (17h * 3600000 ms) + (30 min * 60000 ms) = 63000000

If someTime has the format HH:mm, then you can also store its millisOfDay value, e.g. if someTime = 17:30 then millisOfDay would be (17h * 3600000 ms) + (30 min * 60000 ms) = 63000000

然后,您的查询需要使用 script 过滤器稍微修改一下,如下所示:

Then, your query needs to be reworked a little bit using a script filter, like this:

{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "doc.millisOfDay.value > new DateTime().millisOfDay"
        }
      }
    }
  },
  "sort": {
    "millisOfDay": {
      "order": "asc"
    }
  }
}

这篇关于基于脚本的 Elasticsearch 日期字段排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆