Solr-计算两个日期字段范围内的文档 [英] Solr - count documents in the range of two date fields

查看:165
本文介绍了Solr-计算两个日期字段范围内的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我得到的一些Solr文档示例:

Here are some example Solr documents I got:

{
  "id": "1",
  "openDate": "2017-12-01T00:00:00.000Z",
  "closeDate": "2017-12-04T00:00:00.000Z"
},
{
  "id": "2",
  "openDate": "2017-12-02T00:00:00.000Z",
  "closeDate": "2017-12-04T00:00:00.000Z"
},
{
  "id": "3",
  "openDate": "2017-12-02T00:00:00.000Z",
  "closeDate": "2017-12-06T00:00:00.000Z" 
}

文档处于活动状态"的日期是openDate(包含)和closeDate(不含)之间的日期.我想计算每天处于活动"状态的文档数,因此输出应为:

The dates that a document is "active" are the dates between the openDate (inclusive) and the closeDate (exclusive). I want to count the number of documents that are "active" on each day, so the output should be:

[
  {
    Date: 2017-12-01,
    count: 1
  },
  {
    Date: 2017-12-02,
    count: 3
  },
  {
    Date: 2017-12-03,
    count: 3
  },
  {
    Date: 2017-12-04,
    count: 1
  },
  {
    Date: 2017-12-05,
    count: 1
  }
]

解决此问题的一种简单方法是保留一个多值日期字段(例如称为openDates),使所有日期都在感兴趣的范围内,因此我们按如下所示扩展文档:

One easy approach to solve this is to keep a multi-valued date field (say called openDates) with all the dates in the range of interest, so we expand the documents like this:

  {
    "id": "1",
    "openDate": "2017-12-01T00:00:00.000Z",
    "closeDate": "2017-12-04T00:00:00.000Z",
    "openDates": ["2017-12-01T00:00:00.000Z",
                  "2017-12-02T00:00:00.000Z",
                  "2017-12-03T00:00:00.000Z"]
  },
  {
    "id": "2",
    "openDate": "2017-12-02T00:00:00.000Z",
    "closeDate": "2017-12-04T00:00:00.000Z",
    "openDates": ["2017-12-02T00:00:00.000Z",
                  "2017-12-03T00:00:00.000Z"]    
  },
  {
    "id": "3",
    "openDate": "2017-12-02T00:00:00.000Z",
    "closeDate": "2017-12-06T00:00:00.000Z",
    "openDates": ["2017-12-02T00:00:00.000Z",
                  "2017-12-03T00:00:00.000Z",
                  "2017-12-04T00:00:00.000Z",
                  "2017-12-05T00:00:00.000Z"]    
  }

然后我可以像这样运行一个方面查询:

Then I can run a facet query like this:

/select?q=*:*&facet=true&facet.field=openDates&rows=0

获得我需要的计数.

在Solr中是否有更好的方法来解决此问题?

Is there a better way to solve this in Solr?

理想情况下,另一种方法可以帮助您按小时或分钟进行存储,而不仅仅是几天.如果我们更加细化,上述方法将具有非常大的多值字段.另外,是否有一种很好的方法可以用零计数来填补空白(即缺失的日期)?

Ideally, an alternate approach can help bucket by hour or minute, not just days. The above approach will have a very large multi-valued field if we go more granular. Also, is there a good way to fill the holes (i.e. missing dates) with zero counts?

推荐答案

The DateRangeField will come for the rescue. In schema you will add something like this:

<fieldType name="range_date" class="solr.DateRangeField" />
<field name="active" type="range_date" indexed="true" stored="false"/>

您可以指定有效的范围像这样:

You could specify active range like this:

doc1.addField("active", "[2017-12-01T00:00:00.000Z TO 2017-12-04T00:00:00.000Z]")

并随后通过此请求范围构面字段.

and later request range facets by this field.

粒度为1天的示例(您可以将gap参数更改为不同的值):

Example of params with 1 day granularity (you could change the gap param for different values) :

      q.add("facet", "true")
      q.add("facet.range", "active")
      q.add("facet.range.start", "NOW/MONTH")
      q.add("facet.range.end", "NOW/MONTH+1MONTH")
      q.add("facet.range.include", "outer")
      q.add("facet.range.gap", "+1DAY")

我添加了facet.range.include=outer来保持准确的格式响应(不包括上限和下限).您可以通过选择您想要更多的东西.

I've added facet.range.include=outer to keep exact format response as you like (not including upper and lower bounds). You could change this parameter by choosing something you would like more.

您将确切获得所需的东西:

You will get exactly what you need:

2017-12-01T00:00:00Z
1
2017-12-02T00:00:00Z
3
2017-12-03T00:00:00Z
3
2017-12-04T00:00:00Z
1
2017-12-05T00:00:00Z
1

完整的代码示例为 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆