Solr-计算两个日期字段范围内的文档 [英] Solr - count documents in the range of two date fields
问题描述
这是我得到的一些Solr文档示例:
Here are some example Solr documents I got:
{
"id": "1",
"openDate": "2017-12-01T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z"
},
{
"id": "2",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z"
},
{
"id": "3",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-06T00:00:00.000Z"
}
文档处于活动状态"的日期是openDate(包含)和closeDate(不含)之间的日期.我想计算每天处于活动"状态的文档数,因此输出应为:
The dates that a document is "active" are the dates between the openDate (inclusive) and the closeDate (exclusive). I want to count the number of documents that are "active" on each day, so the output should be:
[
{
Date: 2017-12-01,
count: 1
},
{
Date: 2017-12-02,
count: 3
},
{
Date: 2017-12-03,
count: 3
},
{
Date: 2017-12-04,
count: 1
},
{
Date: 2017-12-05,
count: 1
}
]
解决此问题的一种简单方法是保留一个多值日期字段(例如称为openDates
),使所有日期都在感兴趣的范围内,因此我们按如下所示扩展文档:
One easy approach to solve this is to keep a multi-valued date field (say called openDates
) with all the dates in the range of interest, so we expand the documents like this:
{
"id": "1",
"openDate": "2017-12-01T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z",
"openDates": ["2017-12-01T00:00:00.000Z",
"2017-12-02T00:00:00.000Z",
"2017-12-03T00:00:00.000Z"]
},
{
"id": "2",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z",
"openDates": ["2017-12-02T00:00:00.000Z",
"2017-12-03T00:00:00.000Z"]
},
{
"id": "3",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-06T00:00:00.000Z",
"openDates": ["2017-12-02T00:00:00.000Z",
"2017-12-03T00:00:00.000Z",
"2017-12-04T00:00:00.000Z",
"2017-12-05T00:00:00.000Z"]
}
然后我可以像这样运行一个方面查询:
Then I can run a facet query like this:
/select?q=*:*&facet=true&facet.field=openDates&rows=0
获得我需要的计数.
在Solr中是否有更好的方法来解决此问题?
Is there a better way to solve this in Solr?
理想情况下,另一种方法可以帮助您按小时或分钟进行存储,而不仅仅是几天.如果我们更加细化,上述方法将具有非常大的多值字段.另外,是否有一种很好的方法可以用零计数来填补空白(即缺失的日期)?
Ideally, an alternate approach can help bucket by hour or minute, not just days. The above approach will have a very large multi-valued field if we go more granular. Also, is there a good way to fill the holes (i.e. missing dates) with zero counts?
推荐答案
The DateRangeField will come for the rescue. In schema you will add something like this:
<fieldType name="range_date" class="solr.DateRangeField" />
<field name="active" type="range_date" indexed="true" stored="false"/>
您可以指定有效的范围像这样:
You could specify active range like this:
doc1.addField("active", "[2017-12-01T00:00:00.000Z TO 2017-12-04T00:00:00.000Z]")
并随后通过此请求范围构面字段.
and later request range facets by this field.
粒度为1天的示例(您可以将gap
参数更改为不同的值):
Example of params with 1 day granularity (you could change the gap
param for different values) :
q.add("facet", "true")
q.add("facet.range", "active")
q.add("facet.range.start", "NOW/MONTH")
q.add("facet.range.end", "NOW/MONTH+1MONTH")
q.add("facet.range.include", "outer")
q.add("facet.range.gap", "+1DAY")
我添加了facet.range.include=outer
来保持准确的格式响应(不包括上限和下限).您可以通过选择您想要更多的东西.
I've added facet.range.include=outer
to keep exact format response as you like (not including upper and lower bounds). You could change this parameter by choosing something you would like more.
您将确切获得所需的东西:
You will get exactly what you need:
2017-12-01T00:00:00Z
1
2017-12-02T00:00:00Z
3
2017-12-03T00:00:00Z
3
2017-12-04T00:00:00Z
1
2017-12-05T00:00:00Z
1
完整的代码示例为 查看全文