如何从 mongo db 游标遍历所有其他文档 [英] How to iterate through every other document from a mongo db cursor

查看:65
本文介绍了如何从 mongo db 游标遍历所有其他文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 mongo DB 游标,其中包含要创建到 Dataframes 中的文档.但是,该游标中的文档可能有一个太接近的 runTime.因此,我想获取所有其他文档并从中制作一个数据框.

I have a mongo DB cursor with documents that I want to create into Dataframes. However, the documents in that cursor can have a runTime that's too close. Therefore I'd like to get every other document and make a dataframe out of those.

all_df_forecast = []
for doc in cursor[::2]:
    single_fc_df = pd.DataFrame(doc['data']['PRICES SPOT'])
    all_df_forecast.append(single_fc_df)
            

结果:IndexError:Cursor 实例不支持切片步骤

all_df_forecast = []
for doc in range(0, cursor.count(), 2):
    single_fc_df = pd.DataFrame(doc['data']['PRICES SPOT'])
    all_df_forecast.append(single_fc_df)

结果 TypeError: 'int' object is not subscriptable

现在这是光标与包含数据的文档的方式.

Right now this is how the cursor with the documents that have the data.

 cursor = self._collection.find({
   "Type": "f", 
   "runTime": { "$gte": model_dt_from, "$lte": model_dt_till },
   "data.PRICES SPOT.0": { "$exists": True }
 })

理想情况下,如果光标可以根据我给出的查询来拥有所有其他文档,那将是理想的.我遇到了skip,但据我所知,它跳过了开头用我给的号码.这就是为什么我现在在拥有光标并为每个其他文档创建数据框后处理这个问题

Ideally if the cursor could just have every other document based on the query I give it would be ideal. I came accross skip, but from my understanding it skips the beginning with the number I give it. Which is why I am now tackling this after I have the cursor and creating the dataframes for every other document

推荐答案

使用 cursor.next() 跳过每个备用游标结果.

Use cursor.next() to skip over each alternate cursor result.

作为演示:

from pymongo import MongoClient

client = MongoClient()
db = client.test

db.pytest.delete_many({})
db.pytest.insert_many([{ 'value': i+1 } for i,x in enumerate([1] * 10)])

cursor = db.pytest.find({},{ '_id': 0 })

count  = cursor.count()
print count
cursor.next()

for doc in cursor:
  print doc
  count -= 2
  print count
  if (count > 0):
    cursor.next()

会回来:

10
{u'value': 2}
8
{u'value': 4}
6
{u'value': 6}
4
{u'value': 8}
2
{u'value': 10}
0

调用 cursor.next() 就是在调用之前游标实际上还有剩余的结果,否则会因游标耗尽而引发异常.出于这个原因,你做一些事情,比如获得 cursor.count() 然后在你决定发出之前递减并跟踪剩余的.

The only thing you need to be aware of when calling cursor.next() is that the cursor actually has remaining results before you call it, otherwise you will raise an exception due to the depleted cursor. For this reason you do something like obtain the cursor.count() and then decrement and track the remaining before you decide to issue.

请注意,无论如何,奇数"编号的结果会在检查之前耗尽游标,因此当剩余文档为 0 时,它确实可以确保您不会在偶数编号的结果上前进光标.

Note that "odd" numbered results would deplete the cursor before the check anyway, so it's really there to make sure you don't advance the cursor on even numbered results when the remaining documents are 0.

像您部分尝试的替代方法是将光标转换为 list 然后您可以抓取切片,但这意味着将所有结果加载到内存中,这对于大多数结果集可能是不切实际的尺寸合理.

Alternate approaches like you partly attempted are to convert the cursor to a list and then you can grab slices, but that means loading all results into memory, which is probably impractical for most result sets of a reasonable size.

这篇关于如何从 mongo db 游标遍历所有其他文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆