基于脚本的Elasticsearch日期字段排序 [英] Script-based sorting on Elasticsearch date field

查看:1051
本文介绍了基于脚本的Elasticsearch日期字段排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用Elasticsearch,并且想在映射为date,格式为hour_minute的字段上使用基于脚本的排序.每个文档中可以有该字段的多个实例.

I am just getting started with Elasticsearch and would like to use script-based sorting on a field that is mapped as date, format hour_minute. There can be multiple instances of the field in each document.

在介绍表达式之前,首先,我要尝试一种简单的排序(使用Sense插件):

Before introducing expressions, as a first step I'm trying a simple sort (using the Sense plugin):

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "_script": {
         "script": "doc[\"someTime\"].value",
         "lang": "groovy",
         "type": "date",
         "order": "asc"
      }
   }
}

我收到此错误(片段):

I get this error (fragment):

SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;
shardFailures {[tjWL-zV5QXmGjNlXzLvrzw][myIndex][0]:
SearchParseException[[myIndex][0]: 
query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source…

如果我使用"type": "number"发布以上查询,则没有错误,尽管这当然不会按日期排序.以下工作正常:

If I post the above query with "type": "number" there is no error, although this of course doesn't sort by date. The following works fine:

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "someTime": {
         "order": "asc"
      }
   }
}

最终,我想使用基于脚本的排序方式,因为我将尝试使用日期和时间条件进行查询,过滤或排序,例如查询具有当前日期的文档,然后按照查询后的最短时间对其进行排序.现在的时间,等等.

Ultimately I'd like to use script-based sorting since I will be trying to query, filter or sort using date and time conditions, like query for documents with today’s date, then sort them by the lowest time that is after the time now, etc.

任何建议将不胜感激.

推荐答案

使用脚本对文档进行排序并不是真正有效的方法,尤其是当您的文档库预计会随着时间增长时.因此,我将提供一种解决方案,然后提出另一种选择.

Using scripts to sort documents is not really performant, especially if your document base is expected to grow over time. So I'm going to offer a solution for doing that and then suggest another option.

为了使用脚本进行排序,您需要将日期转换为毫秒,以便可以在一个简单的数字上运行排序(排序类型只能为numberstring).

In order to sort using script, You need to transform your date into milliseconds so your sort can be run on a simple number (sort type can only be number or string).

POST myIndex/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "_script": {
         "script": "doc[\"someTime\"].date.getMillisOfDay()",
         "lang": "groovy",
         "type": "number",       <----- make sure this is number
         "order": "asc"
      }
   }
}

请注意,根据所需的粒度,还可以使用getSecondOfDay()getMinuteOfDay().这样,如果您的查询和过滤器选择了合适的日期的文档,则排序脚本将根据当天的毫秒数(或秒或分钟)对文档进行排序.

Note that depending on the granularity you want, you can also use getSecondOfDay() or getMinuteOfDay(). That way, provided your queries and filters have selected documents for the right day, your sort script will sort documents based on the number of milliseconds (or seconds or minutes) within that day.

第二种解决方案意味着也将自当天开始以来的毫秒数(或秒或分钟)索引到另一个字段中,并简单地使用它进行排序,因此您不需要脚本.最重要的是,无论在搜索时需要的什么信息,只要在索引时就可以知道,都应该对其进行索引,而不是实时进行计算.

The second solution would imply to also index the number of milliseconds (or seconds or minutes) since the beginning of that day into another field and simply use it to sort, so that you don't need script. The bottom line is that whatever information you need at search time that can be known at index time should be indexed instead of computed in real-time.

例如,如果您的someTime字段包含日期2015-10-05T05:34:12.276Z,则您将为millisOfDay字段编制索引,其值为20052276,即

For instance, if your someTime field contains the date 2015-10-05T05:34:12.276Z then you'd index the millisOfDay field with the value 20052276, which is

  • 5小时* 3600000毫秒
  • +34分钟* 60000毫秒
  • +12秒* 1000毫秒
  • +276毫秒

然后您可以使用

POST myIndex/_search
{
   "query": {
      "range": {
          "someTime": {
              "gt": "now"
          }
      }
   },
   "sort": {
      "millisOfDay": {
         "order": "asc"
      }
   }
}

请注意,我添加了一个查询以仅选择日期为someTime的文档,因此将来您将获得所有文档,但按升序millisOfDay进行排序,这意味着您将获得最先从now开始的日期.

Note that I've added a query to select only the documents whose someTime date is after now, so you'll get all documents in the future, but sorted by ascending millisOfDay, which means you'll get the nearest date from now first.

更新

如果someTime的格式为HH:mm,则还可以存储其millisOfDay值,例如如果someTime = 17:30,则millisOfDay将为(17h * 3600000 ms)+(30分钟* 60000 ms)= 63000000

If someTime has the format HH:mm, then you can also store its millisOfDay value, e.g. if someTime = 17:30 then millisOfDay would be (17h * 3600000 ms) + (30 min * 60000 ms) = 63000000

然后,您需要使用script过滤器对查询进行一些修改,如下所示:

Then, your query needs to be reworked a little bit using a script filter, like this:

{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "doc.millisOfDay.value > new DateTime().millisOfDay"
        }
      }
    }
  },
  "sort": {
    "millisOfDay": {
      "order": "asc"
    }
  }
}

这篇关于基于脚本的Elasticsearch日期字段排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆