使用Pandas和PyMongo将MongoDB数据加载到DataFrame的更好方法? [英] A better way to load MongoDB data to a DataFrame using Pandas and PyMongo?

查看：312 发布时间：2020/5/24 0:07:30 python pandas pymongo

本文介绍了使用Pandas和PyMongo将MongoDB数据加载到DataFrame的更好方法?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个0.7 GB的MongoDB数据库，其中包含要尝试加载到数据帧中的推文.但是，我得到一个错误.

I have a 0.7 GB MongoDB database containing tweets that I'm trying to load into a dataframe. However, I get an error.

MemoryError:

我的代码如下:

cursor = tweets.find() #Where tweets is my collection
tweet_fields = ['id']
result = DataFrame(list(cursor), columns = tweet_fields)

我已经尝试了以下答案中的方法，这些方法有时会在加载数据库之前创建数据库所有元素的列表.

I've tried the methods in the following answers, which at some point create a list of all the elements of the database before loading it.

https://stackoverflow.com/a/17805626/2297475
https://stackoverflow.com/a/16255680/2297475

但是，在另一个有关list()的答案中，此人表示这对小型数据集非常有用，因为所有内容都已加载到内存中.

However, in another answer which talks about list(), the person said that it's good for small data sets, because everything is loaded into memory.

https://stackoverflow.com/a/13215411/2297475

就我而言，我认为这是错误的根源.太多数据无法加载到内存中.我还能使用什么其他方法?

In my case, I think it's the source of the error. It's too much data to be loaded into memory. What other method can I use?

推荐答案

我已将代码修改为以下内容:

I've modified my code to the following:

cursor = tweets.find(fields=['id'])
tweet_fields = ['id']
result = DataFrame(list(cursor), columns = tweet_fields)

通过在find()函数中添加 fields 参数，我限制了输出.这意味着我没有将每个字段都加载，而是仅将所选字段加载到DataFrame中.现在一切正常.

By adding the fields parameter in the find() function I restricted the output. Which means that I'm not loading every field but only the selected fields into the DataFrame. Everything works fine now.

这篇关于使用Pandas和PyMongo将MongoDB数据加载到DataFrame的更好方法?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Pandas和PyMongo将MongoDB数据加载到DataFrame的更好方法? [英] A better way to load MongoDB data to a DataFrame using Pandas and PyMongo?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Pandas和PyMongo将MongoDB数据加载到DataFrame的更好方法? [英] A better way to load MongoDB data to a DataFrame using Pandas and PyMongo?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭