重塑存储在集合中并导出为CSV的数组 [英] Reshape array that is stored in a collection and export to CSV
问题描述
我有一个存储在Mongo数据库/ JSON文件中的Facebook Page Likes(标题为 pagelikes )的集合。以下是一个条目的示例。
I have a collection of Facebook Page Likes (titled pagelikes) that is stored in a Mongo database/JSON file. Below is an example of one entry.
{
"_id" : ObjectId("4725bf8731b8faf4c04595bb"),
"user_id" : "0939bf9w9804842f9f817ad100",
"page_likes" : [
{
"id" : "859302873383",
"name" : "Hotdogs"
},
{
"id" : "8593683902",
"name" : "Video Games"
},
{
"id" : "849204859849028",
"name" : "Road Bikes"
}
]
}
id =唯一的Facebook页面标识符,名称 = Facebook页面的名称。
id = the unique Facebook Page identifier, name = the name of a Facebook page.
我想将整个集合导出为包含三列的CSV文件, user_id , page_likes.id , page_likes.name 即可。它看起来像如下:
I would like to export this entire collection to a CSV file with three columns, user_id, page_likes.id, page_likes.name. It would look like the following:
user_id page_likes.id page_likes.name
0939bf9w9804842f9f817ad100 859302873383 Hotdogs
0939bf9w9804842f9f817ad100 8593683902 Video Games
0939bf9w9804842f9f817ad100 849204859849028 Road Bikes
... ... ...
JSON文件非常大(4GB),包含超过120K的用户,并且条目的数量没有限制。
The JSON file is quite large (4GB), contains over 120K users, and there is no limit on the number of an entry has.
我试过了尽管聚合框架似乎最有用(可能是项目和展开函数),但mongoexport失败了。也就是说,我对Mongo的经验不多。
I have tried and failed with mongoexport, although an aggregation framework seems most useful (possibly the project and unwind functions). That said, I have little experience with Mongo.
任何建议,示例或建议都会非常有用。
Any advice, examples or suggestions would be very helpful.
非常感谢,
R
推荐答案
你可以通过多种方式解决这个问题。
You can deal with this in a number of ways.
首先,如果你有MongoDB 3.4,你可以使用View以表示数组内容为un-wound的集合。 View基本上是一个聚合管道语句,就大多数使用集合的操作而言,它似乎是一个普通的集合。
Firstly if you have MongoDB 3.4 available then you could use a "View" in order to represent the collection with the array contents "un-wound". A "View" is basically an aggregation pipeline statement that appears to be a normal collection as far as most actions that would use a collection are concerned.
所以假定你的源集合在这里被称为pages
,然后您将创建视图:
So presuming your source collection is called "pages"
here, then you would create the "View" with:
db.createView("pageArray", "pages", [{ "$unwind": "$page_likes" }])
然后你可以正常查询集合:
Then you can query the collection as normal:
db.pageArray.find()
/* 1 */
{
"_id" : ObjectId("4725bf8731b8faf4c04595bb"),
"user_id" : "0939bf9w9804842f9f817ad100",
"page_likes" : {
"id" : "859302873383",
"name" : "Hotdogs"
}
}
/* 2 */
{
"_id" : ObjectId("4725bf8731b8faf4c04595bb"),
"user_id" : "0939bf9w9804842f9f817ad100",
"page_likes" : {
"id" : "8593683902",
"name" : "Video Games"
}
}
/* 3 */
{
"_id" : ObjectId("4725bf8731b8faf4c04595bb"),
"user_id" : "0939bf9w9804842f9f817ad100",
"page_likes" : {
"id" : "849204859849028",
"name" : "Road Bikes"
}
}
然后发出 mongoexport
,好像它是一个普通的集合:
And subsequently issue the mongoexport
as if it were a normal collection:
mongoexport -d test -c pageArray --type=csv --fields user_id,page_likes.id,page_likes.name
2017-07-05T13:14:11.588+1000 connected to: localhost
user_id,page_likes.id,page_likes.name
0939bf9w9804842f9f817ad100,859302873383,Hotdogs
0939bf9w9804842f9f817ad100,8593683902,Video Games
0939bf9w9804842f9f817ad100,849204859849028,Road Bikes
2017-07-05T13:14:11.589+1000 exported 3 records
当然要添加 - out
或实际输出到文件的标准重定向。
Of course adding --out
or a standard redirect to actually output to a file.
如果您的MongoDB是旧版本但至少有 $ out
可用(来自MongoDB 2.6)然后写入另一个集合:
If your MongoDB is an older version but at least has $out
available ( from MongoDB 2.6 ) then write to another collection:
db.pages.aggregate([
{ "$unwind": "$page_likes" },
{ "$project": { "_id": 0 } },
{ "$out": "pagesArray" }
])
然后你基本上运行相同的 mongoexport
如上所述,因为它也是一个可以访问的集合。
Then you basically run the same mongoexport
as above since it's also a collection that is accessible to do so.
如果你真的不想创建视图或另一个集合 那么你可以简单地将一个简短的脚本发送到 mongo
shell。虽然非常hacky:
If you really don't want to create either a "View" or "another collection", then you could simply send a short script to the mongo
shell. Albeit in a very hacky way:
mongo --quiet --eval '
print("user_id,page_likes.id,page_likes.name");
db.pages.aggregate([
{ "$unwind": "$page_likes" },
{ "$project": { "_id": 0 } },
]).forEach(p => print(`${p.user_id},${p.page_likes.id},${p.page_likes.name}`))'
甚至没有 aggregate()
和 $ unwind
at:
mongo --quiet --eval '
print("user_id,page_likes.id,page_likes.name");
db.pages.find({},{ _id: 0 }).forEach(p =>
p.page_likes.forEach(l => print(`${p.user_id},${l.id},${l.name}`)))'
哪个给你相同的输出:
user_id,page_likes.id,page_likes.name
0939bf9w9804842f9f817ad100,859302873383,Hotdogs
0939bf9w9804842f9f817ad100,8593683902,Video Games
0939bf9w9804842f9f817ad100,849204859849028,Road Bikes
另请注意,如果你想要或需要一个不同的分隔符而不是逗号,
这里,那么最后两个shell的方法中的任何一个都可能是要走的路。因为这是预定添加到 mongoexport
和 mongoimport
与 TOOLS-87 ,但当然还未解决。所以如果你想要不同的输出,那么你自己做。
Note also that if you want or "need" a different delimiter than comma ,
here, then either of the two last approaches with the shell is probably the way to go. As this is "scheduled" for addition to mongoexport
and mongoimport
with TOOLS-87, but of course is "yet to be resolved". So if you want different output, then you do it yourself.
这篇关于重塑存储在集合中并导出为CSV的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!