MongoDB - 获取指定日期的最新值的最快方法是什么? [英] MongoDB - What is the fastest way to get the latest value as-of a given date?
问题描述
我收集了来自不同来源的测量值,这些测量值来自不同的频率.
I have a collection of measurements from different sources, which come at different frequencies.
对于任何给定的来源子集,我如何获得截至特定日期的最新好价值?(这类似于 pandas.Index.asof)?
How do I get the latest good value as-of a specific date, for any given subset of sources? (this is similar to pandas.Index.asof)?
需要明确的是,对于其中一些时间序列可能没有所需日期的可用值,因此我必须在可用日期中找到低于查询日期的最新日期.
To be clear, for some of these timeseries there might be no available value for the desired date, so I must find the most recent among the available dates that are lower than the query date.
时间序列可能如下所示:
The timeseries could look like this:
{_id:new ObjectId(), source:"1stDayofMonth", date:new ISODate(<day1>) value:somevalue}
{_id:new ObjectId(), source:"Monday", date:new ISODate(<day1>) value:somevalue}
{_id:new ObjectId(), source:"daily", date:new ISODate(<day1>) value:somevalue}
/...
{_id:new ObjectId(), source:"daily", date:new ISODate(<dayN>) value:somevalue}
{_id:new ObjectId(), source:"Wednesday", date:new ISODate(<dayN>) value:somevalue}
// and so on...
给定正确的索引 (db.myCollection.createIndex({date:1, source:1})
),我怎样才能获得最新的好 value
as-of给定的 queryDate
,对于 sources
的任何子集?
Given proper indexation (db.myCollection.createIndex({date:1, source:1})
), how can I get the latest good value
as-of a given queryDate
, for any subset of sources
?
这是我得到的程度,但是这个解决方案无法返回每个 source
仅 1 个 value
(如果你阅读代码,你会看到这在只查询一个 source
,但是当查询不同的源时,它会返回 1 个以上的高频源值):
This is how far I got, but this solution fails to return just 1 value
per source
(if you read the code, you'll see this would work when querying on just one source
, but when querying on different ones it returns more than 1 value of the high frequency sources):
querySources = ['1stDayofMonth','Monday'] # as an example, let's say I want only these 2 sources
nSources = np.size(querySources)
cursor = db.myCollection.find( {'source':{ '$in': querySources}, 'date':{ '$lt': queryDate}}).sort(date:-1).limit(nSources)
有什么想法吗?
我应该提到文档指出这个解决方案,但 aggregate
可能会很慢,而且集合足够大以至于查询时间变长(比如查询 1000 个源,每个源有 10000 天的数据)
I should have mentioned that the docs point out to this solution, but aggregate
might be very slow and the collection large enough that query times become long (say query 1000 sources, each with 10000 days of data)
推荐答案
您得到的结果不止一个,因为 nSources
的大小大于 1.
You're getting more than one result because nSources
size is larger than 1.
如果您想按来源分组,则必须使用 aggregate
,或者您必须为每个来源运行一个 find()
然后加入结果.
You have to use aggregate
if you want to group by the sources or you have to run one find()
per source then join the results.
使用aggregate
的解决方案:
db.myCollection.aggregate([
{$match : {source: {$in: ["1stDayofMonth", "Monday"]}}},
{$match : {date: {$lt: queryDate}}},
{$sort : { date : -1 } },
{$group : {
_id : "$source",
date : {"$first" : "$date"},
value : {"$first" : "$value"}
}}
])
使用find()
的解决方案:
curs1 = db.myCollection.find( {'source': "1stDayofMonth",
'date':{ '$lt': queryDate}})
.sort({date:-1}).limit(1);
curs2 = db.myCollection.find( {'source': "Monday",
'date':{ '$lt': queryDate}})
.sort({date:-1}).limit(1);
# Now add the result from each cursor to an Array in your app
这篇关于MongoDB - 获取指定日期的最新值的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!