使用rmongodb和plyr将大量MongoDB集合转发到R中的data.frame [英] Transfer large MongoDB collections to data.frame in R with rmongodb and plyr

查看:232
本文介绍了使用rmongodb和plyr将大量MongoDB集合转发到R中的data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试使用rmongodb和plyr包将MongoDB的数据帧转移到R时,我有一些奇怪的结果,巨大的集合集。我从这个主题的各种github和论坛拿起这个代码,并为了我的目的而适应:

  ##加载两者
库(rmongodb)
库(plyr)
##连接到MongoDB
mongo< - mongo.create(host =localhost)
#[ 1] TRUE
##获取数据库的列表
mongo.get.databases(mongo)
#数据库列表(使用mydatabase)
##获取mydatabase的集合
mongo.get.collections(mongo,db =mydatabase)
#我的数据库的所有集合的列表
##验证mycollection的大小
DBNS =mycollection
mongo.count(mongo,ns = DBNS)
#[1] 845923我的收藏内的文件
##将mycollection(以BSON MongoDB格式)转换为数据框架(适用于R)
export = data.frame(stringAsFactors = FALSE)
cursor = mongo.find(mongo,DBNS)
i = 1
while(mongo.cursor。 next(cursor))
{
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
tmp.df = as.data.frame(t(unlist(tmp)),stringAsFactors = FALSE)
export = rbind.fill(export,tmp.df)
i = i + 1
}
##显示数据库的大小export
dim(export)
#[1] 20585 23
##查看数据库的更多信息export
str(export)
#'data.frame':20585 obs。的23个变量
#等...

转移还不是很好:有一个巨大的在MongoDB中发现的mycollection中的845923文档与R中的20585个观察结果之间的差异。



我可能不同意上述代码。我不确定i = 1和i = i + 1对于此函数是有用的(可能来自带有rmongodb查询的代码),如果我没有附加的特定值。我还发现t(unlist(tmp))奇怪,t来自哪里?



问题是我在MongoDB中遇到与集合大小有很大差异和数据库大小在R与大集合集(优于数千个文档)。
我的电脑有一个很好的RAM和R似乎在这个过程中工作得很好(没有冻结,没有崩溃,花费时间,但由于从BSON到列表到数据帧的转换很大)。



我已经成功地将MongoDB中的36100个文档从MongoDB转移到了R进行数据分析,没有任何问题。



所以我'不知道问题是从哪里来的。



提前感谢有关此主题的任何帮助。

解决方案

我会说这一切都不是必需的。您可以按照以下简单的方式进行操作:
这需要一个名为rmongodb的包。该包需要最新版本,不会在早期版本中出现。这个包处理mongodb。还有其他的包,如RMongo。



在R中安装rmongodb

  install.packages(rmongodb)

要将MongoDB的大数据转换为R中的一个数据框

  library(rmongodb)
mongo< - mongo.create()#创建一个连接mongodb localhost
mongo.is.connected(mongo)#检查mongodb是否连接
mongo.get.databases(mongo)#show显示mongodb
mongo.get.database中的所有数据库。集合(mongo,mydb)#显示数据库中存在的所有集合mydb
data< - mongo.find.all(mongo,mydb.collection,data.frame = TRUE)#这样就足够了将整个列表转换为R中的数据框。


I have some strange results with huge collections sets when trying to transfer as data frames from MongoDB to R with rmongodb and plyr packages. I pick up this code from various github and forums on the subject, and adapt it for my purposes :

## load the both packages
library(rmongodb)
library(plyr)
## connect to MongoDB
mongo <- mongo.create(host="localhost")
# [1] TRUE
## get the list of the databases
mongo.get.databases(mongo)
# list of databases (with mydatabase)
## get the list of the collections of mydatabase
mongo.get.collections(mongo, db = "mydatabase")
# list of all the collections of my database
## Verify the size of mycollection
DBNS = "mycollection"
mongo.count(mongo, ns = DBNS)
# [1] 845923 documents inside "my collection"
## transform mycollection (in BSON MongoDB format) to a data frame (adapted for R)
export = data.frame(stringAsFactors = FALSE)
cursor = mongo.find(mongo, DBNS)
i = 1
while(mongo.cursor.next(cursor))
{
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
tmp.df = as.data.frame(t(unlist(tmp)), stringAsFactors = FALSE)
export = rbind.fill(export, tmp.df)
i = i + 1
}
## show the size of the database "export"
dim(export)
# [1] 20585 23
## check more information on the database "export"
str(export)
# 'data.frame': 20585 obs. of 23 variables
# etc…

The transfer is not well done : there is a huge difference between the 845923 documents inside "mycollection" found in MongoDB and the 20585 observations in R.

I may not agree with the code above. I'm not sure that the i = 1 and the i = i + 1 are useful for this function (may be coming from code with queries with rmongodb), if I have no specific values to attached with. I found also the "t(unlist(tmp))" strange, where the t comes from ?

The problem is that I encounter some big differences from collections size in MongoDB and database size in R with large collections sets (superior to several thousands of documents). My PC have a good RAM and R seems to work well during the process (no freeze, no crash, taking time but normal due to the large conversion to do from BSON to list to data frame).

I have succeed to transfer a MongoDB collection of 36100 documents from MongoDB to R for data analysis with no problem.

So I'm not sure where the problem is coming from.

Thanks in advance for any help on this subject.

解决方案

I would say all this is not needed. You can proceed in simple way as follows: This requires a package named "rmongodb" in R. This package require latest version and would not be present in the earlier versions. This package deals with mongodb. There are other packages as well such as "RMongo".

for installing rmongodb in R

install.packages("rmongodb")

To convert large data of MongoDB into a data frame in R

library(rmongodb)
mongo <- mongo.create() # create a connection to mongodb localhost
mongo.is.connected(mongo) # check whether mongodb is connected
mongo.get.databases(mongo) #shows all databases present in mongodb
mongo.get.database.collections(mongo,"mydb") #displays all collections present in database mydb
data <- mongo.find.all(mongo,"mydb.collection",data.frame=TRUE) # This would suffice as this would convert the entire list into a data frame in R.

这篇关于使用rmongodb和plyr将大量MongoDB集合转发到R中的data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆