使用mongoimport将csv数据作为数组导入mongodb中 [英] Import csv data as array in mongodb using mongoimport

查看:179
本文介绍了使用mongoimport将csv数据作为数组导入mongodb中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用mongoimport将csv数据导入mongodb.该集合是这样的:

I have been trying to import the csv data into mongodb using mongoimport. The collection is like this:

{
id:"122234343",
name: "name1",
children: ["222334444","333344444"]
}

我尝试的一种方法是创建2个csv文件-一个ID为&名称和其他带有ID的子元素(如果id有两个子元素,那么它将有两行).使用mongoimport将数据导入两个不同的集合中,然后在第二个具有子数据的集合中使用foreach()来更新集合数据.

One approach I tried is to create 2 csv files - one with id & name and other with id, children (if id has two children then it will have two rows). Import the data into two different collections using mongoimport and then use update the collection data using foreach() at second collection having children data.

请建议是否还有其他方法可以直接从CSV填充此子级"数组?

Please suggest is there any another way to populate this "children" array directly from CSV??

推荐答案

对我来说,最简单的方法是弄清楚如何为

To me, the simplest way to work out how to format your "CSV" for mongoimport is to simply create a collection then use mongoexport on it to see what the CSV format should look like.

因此从外壳创建您的文档:

So create your document from the shell:

db.newcol.insert({
  id:"122234343",
  name: "name1",
  children: ["222334444","333344444"]
})

然后退出外壳并运行mongoexport:

 mongoexport -d test -c testcol --fields id,name,children --type csv > out.csv

哪个将显示为:

id,name,children
122234343,name1,"[""222334444"",""333344444""]"

其中数组"用字符串"表示,并使用引号""以其转义形式表示.

Where the "array" is represented with a "string" and using the quotes "" in their escaped form.

现在这是一个使用mongoimport的明确位置,所以现在只需导入"即可进行测试:

That now is a pretty clear place to use mongoimport from, so just "import" now to test:

mongoimport -d test -c newcol --headerline --type csv out.csv

重新输入外壳并查看新集合中的文档:

Re-enter the shell and see the document(s) in the new collection:

db.newcol.findOne()
{
        "_id" : ObjectId("59476924d6eb0b7d6ac37e02"),
        "id" : 122234343,
        "name" : "name1",
        "children" : "[\"222334444\",\"333344444\"]"
}

所有内容均已存在,这些子项被列为字符串"而不是数组.但这并不是一个真正的问题,因为我们已经导入了数据,而现在正是我们进行实际转换的原因:

So everything is there, BUT the children are listed as a "string" rather than an array. But this is not really a problem, since we got the data imported and now it's just up us to now actually transform it:

var ops = [];
db.testcol.find({ "children": { "$type": 2} }).forEach(doc => {
  var children = doc.children.split(',').map( e => e.replace(/"|\[|\]|\\/gm,'').toString() );
  ops.push({
    "updateOne": {
      "filter": { "_id": doc._id },
      "update": { "$set": { "children": children } }
    }
  });

  if ( ops.length >= 1000 ) {
    db.newcol.bulkWrite(ops);
    ops = [];
  }             
});

if ( ops.length > 0 ) {
  db.newcol.bulkWrite(ops);
  ops = [];
}

因此,这将迭代通过

So that is going to iterate anything that was imported into the collection that has a BSON type of 2 which is "string" via the $type query operator.

然后,我们将字符串取为字符串,将其拆分为数组,然后剥离其他字符以仅保留所需的值.

Then we take the string, split it as an array and strip the other characters to only leave the value you want.

使用 .bulkWrite() 提交这些更新以一种有效的方式,而不是每个请求都写每个文档.实际上,它们是按1000个批次批量发送到服务器的.

Using .bulkWrite() you commit those updates in an efficient way, rather than writing every single document per request. They are actually sent in batches of 1000 to the server.

最终结果是原始所需形式的文档:

The end result is the document in the original wanted form:

db.testcol.findOne()
{
        "_id" : ObjectId("5947652ccb237bd6e4e902a5"),
        "id" : "122234343",
        "name" : "name1",
        "children" : [
                "222334444",
                "333344444"
        ]
}

这就是我逐步"介绍如何计算CSV格式,导入它,然后将数据转换"为所需状态的逐步".

So that is my "step by step" of how you can work out your CSV format, import it and then "transform" the data into the state that you need it.

这篇关于使用mongoimport将csv数据作为数组导入mongodb中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆