重塑存储在集合中并导出为CSV的数组 [英] Reshape array that is stored in a collection and export to CSV

查看:154
本文介绍了重塑存储在集合中并导出为CSV的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储在Mongo数据库/ JSON文件中的Facebook Page Likes(标题为 pagelikes )的集合。以下是一个条目的示例。

I have a collection of Facebook Page Likes (titled pagelikes) that is stored in a Mongo database/JSON file. Below is an example of one entry.

{
    "_id" : ObjectId("4725bf8731b8faf4c04595bb"),
    "user_id" : "0939bf9w9804842f9f817ad100",
    "page_likes" : [ 
        {
            "id" : "859302873383",
            "name" : "Hotdogs"
        }, 
        {
            "id" : "8593683902",
            "name" : "Video Games"
        }, 
        {
            "id" : "849204859849028",
            "name" : "Road Bikes"
        }
    ]
}

id =唯一的Facebook页面标识符,名称 = Facebook页面的名称。

id = the unique Facebook Page identifier, name = the name of a Facebook page.

我想将整个集合导出为包含三列的CSV文件, user_id page_likes.id page_likes.name 即可。它看起来像如下:

I would like to export this entire collection to a CSV file with three columns, user_id, page_likes.id, page_likes.name. It would look like the following:

user_id                     page_likes.id     page_likes.name
0939bf9w9804842f9f817ad100  859302873383      Hotdogs
0939bf9w9804842f9f817ad100  8593683902        Video Games
0939bf9w9804842f9f817ad100  849204859849028   Road Bikes
...                         ...               ...

JSON文件非常大(4GB),包含超过120K的用户,并且条目的数量没有限制。

The JSON file is quite large (4GB), contains over 120K users, and there is no limit on the number of an entry has.

我试过了尽管聚合框架似乎最有用(可能是项目和展开函数),但mongoexport失败了。也就是说,我对Mongo的经验不多。

I have tried and failed with mongoexport, although an aggregation framework seems most useful (possibly the project and unwind functions). That said, I have little experience with Mongo.

任何建议,示例或建议都会非常有用。

Any advice, examples or suggestions would be very helpful.

非常感谢,

R

推荐答案

你可以通过多种方式解决这个问题。

You can deal with this in a number of ways.

首先,如果你有MongoDB 3.4,你可以使用View以表示数组内容为un-wound的集合。 View基本上是一个聚合管道语句,就大多数使用集合的操作而言,它似乎是一个普通的集合。

Firstly if you have MongoDB 3.4 available then you could use a "View" in order to represent the collection with the array contents "un-wound". A "View" is basically an aggregation pipeline statement that appears to be a normal collection as far as most actions that would use a collection are concerned.

所以假定你的源集合在这里被称为pages,然后您将创建视图:

So presuming your source collection is called "pages" here, then you would create the "View" with:

db.createView("pageArray", "pages", [{ "$unwind": "$page_likes" }])

然后你可以正常查询集合:

Then you can query the collection as normal:

db.pageArray.find()

/* 1 */
{
    "_id" : ObjectId("4725bf8731b8faf4c04595bb"),
    "user_id" : "0939bf9w9804842f9f817ad100",
    "page_likes" : {
        "id" : "859302873383",
        "name" : "Hotdogs"
    }
}

/* 2 */
{
    "_id" : ObjectId("4725bf8731b8faf4c04595bb"),
    "user_id" : "0939bf9w9804842f9f817ad100",
    "page_likes" : {
        "id" : "8593683902",
        "name" : "Video Games"
    }
}

/* 3 */
{
    "_id" : ObjectId("4725bf8731b8faf4c04595bb"),
    "user_id" : "0939bf9w9804842f9f817ad100",
    "page_likes" : {
        "id" : "849204859849028",
        "name" : "Road Bikes"
    }
}

然后发出 mongoexport ,好像它是一个普通的集合:

And subsequently issue the mongoexport as if it were a normal collection:

mongoexport -d test -c pageArray --type=csv --fields user_id,page_likes.id,page_likes.name
2017-07-05T13:14:11.588+1000    connected to: localhost
user_id,page_likes.id,page_likes.name
0939bf9w9804842f9f817ad100,859302873383,Hotdogs
0939bf9w9804842f9f817ad100,8593683902,Video Games
0939bf9w9804842f9f817ad100,849204859849028,Road Bikes
2017-07-05T13:14:11.589+1000    exported 3 records

当然要添加 - out 或实际输出到文件的标准重定向。

Of course adding --out or a standard redirect to actually output to a file.

如果您的MongoDB是旧版本但至少有 $ out 可用(来自MongoDB 2.6)然后写入另一个集合:

If your MongoDB is an older version but at least has $out available ( from MongoDB 2.6 ) then write to another collection:

db.pages.aggregate([
  { "$unwind": "$page_likes" },
  { "$project": { "_id": 0 } },
  { "$out": "pagesArray" }
])

然后你基本上运行相同的 mongoexport 如上所述,因为它也是一个可以访问的集合。

Then you basically run the same mongoexport as above since it's also a collection that is accessible to do so.

如果你真的不想创建视图或另一个集合 那么你可以简单地将一个简短的脚本发送到 mongo shell。虽然非常hacky:

If you really don't want to create either a "View" or "another collection", then you could simply send a short script to the mongo shell. Albeit in a very hacky way:

mongo --quiet --eval '
    print("user_id,page_likes.id,page_likes.name");
    db.pages.aggregate([ 
      { "$unwind": "$page_likes" },
      { "$project": { "_id": 0 } },
    ]).forEach(p => print(`${p.user_id},${p.page_likes.id},${p.page_likes.name}`))'

甚至没有 aggregate() $ unwind at:

mongo --quiet --eval '
    print("user_id,page_likes.id,page_likes.name");
    db.pages.find({},{ _id: 0 }).forEach(p =>
       p.page_likes.forEach(l => print(`${p.user_id},${l.id},${l.name}`)))'

哪个给你相同的输出:

user_id,page_likes.id,page_likes.name
0939bf9w9804842f9f817ad100,859302873383,Hotdogs
0939bf9w9804842f9f817ad100,8593683902,Video Games
0939bf9w9804842f9f817ad100,849204859849028,Road Bikes

另请注意,如果你想要或需要一个不同的分隔符而不是逗号这里,那么最后两个shell的方法中的任何一个都可能是要走的路。因为这是预定添加到 mongoexport mongoimport TOOLS-87 ,但当然还未解决。所以如果你想要不同的输出,那么你自己做。

Note also that if you want or "need" a different delimiter than comma ,here, then either of the two last approaches with the shell is probably the way to go. As this is "scheduled" for addition to mongoexport and mongoimport with TOOLS-87, but of course is "yet to be resolved". So if you want different output, then you do it yourself.

这篇关于重塑存储在集合中并导出为CSV的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆