MongoDB - 获取指定日期的最新值的最快方法是什么? [英] MongoDB - What is the fastest way to get the latest value as-of a given date?

查看:23
本文介绍了MongoDB - 获取指定日期的最新值的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收集了来自不同来源的测量值,这些测量值来自不同的频率.

I have a collection of measurements from different sources, which come at different frequencies.

对于任何给定的来源子集,我如何获得截至特定日期的最新好价值?(这类似于 pandas.Index.asof)?

How do I get the latest good value as-of a specific date, for any given subset of sources? (this is similar to pandas.Index.asof)?

需要明确的是,对于其中一些时间序列可能没有所需日期的可用值,因此我必须在可用日期中找到低于查询日期的最新日期.

To be clear, for some of these timeseries there might be no available value for the desired date, so I must find the most recent among the available dates that are lower than the query date.

时间序列可能如下所示:

The timeseries could look like this:

{_id:new ObjectId(), source:"1stDayofMonth", date:new ISODate(<day1>) value:somevalue}
{_id:new ObjectId(), source:"Monday", date:new ISODate(<day1>) value:somevalue}
{_id:new ObjectId(), source:"daily", date:new ISODate(<day1>) value:somevalue}
/...
{_id:new ObjectId(), source:"daily", date:new ISODate(<dayN>) value:somevalue}
{_id:new ObjectId(), source:"Wednesday", date:new ISODate(<dayN>) value:somevalue}
// and so on... 

给定正确的索引 (db.myCollection.createIndex({date:1, source:1})),我怎样才能获得最新的好 value as-of给定的 queryDate,对于 sources 的任何子集?

Given proper indexation (db.myCollection.createIndex({date:1, source:1})), how can I get the latest good value as-of a given queryDate, for any subset of sources?

这是我得到的程度,但是这个解决方案无法返回每个 source 仅 1 个 value(如果你阅读代码,你会看到这在只查询一个 source,但是当查询不同的源时,它会返回 1 个以上的高频源值):

This is how far I got, but this solution fails to return just 1 value per source (if you read the code, you'll see this would work when querying on just one source, but when querying on different ones it returns more than 1 value of the high frequency sources):

querySources = ['1stDayofMonth','Monday']    # as an example, let's say I want only these 2 sources
nSources = np.size(querySources)
cursor = db.myCollection.find( {'source':{ '$in': querySources}, 'date':{ '$lt': queryDate}}).sort(date:-1).limit(nSources)

有什么想法吗?

我应该提到文档指出这个解决方案,但 aggregate 可能会很慢,而且集合足够大以至于查询时间变长(比如查询 1000 个源,每个源有 10000 天的数据)

I should have mentioned that the docs point out to this solution, but aggregate might be very slow and the collection large enough that query times become long (say query 1000 sources, each with 10000 days of data)

推荐答案

您得到的结果不止一个,因为 nSources 的大小大于 1.

You're getting more than one result because nSources size is larger than 1.

如果您想按来源分组,则必须使用 aggregate,或者您必须为每个来源运行一个 find() 然后加入结果.

You have to use aggregate if you want to group by the sources or you have to run one find() per source then join the results.

使用aggregate的解决方案:

db.myCollection.aggregate([
{$match : {source: {$in: ["1stDayofMonth", "Monday"]}}},
{$match : {date: {$lt: queryDate}}},
{$sort : { date : -1 } },
{$group : {
    _id : "$source",
    date : {"$first" : "$date"},
    value : {"$first" : "$value"}   
    }}
])

使用find()的解决方案:

curs1 = db.myCollection.find( {'source': "1stDayofMonth", 
'date':{ '$lt': queryDate}})
.sort({date:-1}).limit(1);

curs2 = db.myCollection.find( {'source': "Monday", 
'date':{ '$lt': queryDate}})
.sort({date:-1}).limit(1);

# Now add the result from each cursor to an Array in your app

这篇关于MongoDB - 获取指定日期的最新值的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆