MongoDB Shell中的无序批量更新记录 [英] Unordered bulk update records in MongoDB shell

查看:82
本文介绍了MongoDB Shell中的无序批量更新记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数百万个类似于以下内容的文档的集合:

I've got a collection consisting of millions of documents that resemble the following:

{
    _id: ObjectId('...'),
    value: "0.53"
    combo: [
        {
            h: 0,
            v: "0.42"
        },
        {
            h: 1,
            v: "1.32"
        }
    ]
}

问题是值存储为字符串,我需要将它们转换为浮点/双精度.

The problem is that the values are stored as strings and I need to convert them to float/double.

我正在尝试并且可以正常工作,但是考虑到数据量,这将需要几天才能完成:

I'm trying this and it's working but this'll take days to complete, given the volume of data:

db.collection.find({}).forEach(function(obj) { 
    if (typeof(obj.value) === "string") {
        obj.value = parseFloat(obj.value);
        db.collection.save(obj);
    }

     obj.combo.forEach(function(hv){
         if (typeof(hv.value) === "string") {
            hv.value = parseFloat(hv.value);
            db.collection.save(obj);
         }
     });
});

我在阅读Mongo文档时遇到了批量更新,我正在尝试这样做:

I came across bulk update reading the Mongo docs and I'm trying this:

var bulk = db.collection.initializeUnorderedBulkOp();
bulk.find({}).update(
    { 
      $set: { 
                "value": parseFloat("value"), 
            }
    });
bulk.execute();

这会运行...但是我得到一个NAN作为值,这是因为它认为我正在尝试将值"转换为浮点数.我尝试了像this.value"$value"这样的其他变体,但无济于事.另外,这种方法仅尝试纠正另一个对象中的值,而不是数组中的值.

This runs... but I get a NAN as a value, which is because it thinks I'm trying to convert "value" to a float. I've tried different variations like this.value and "$value" but to no avail. Plus this approach only attempts to correct the value in the other object, not the ones in the array.

我将不胜感激.预先感谢!

I'd appreciate any help. Thanks in advance!

推荐答案

通过以下方式解决该问题:

Figured it out the following way:

1)为了在文档级别进行转换,我遇到了这篇文章和回复Markus撰写的文章为我的解决方案铺平了道路:

1) To convert at the document level, I came across this post and the reply by Markus paved the way to my solution:

var bulk = db.collection.initializeUnorderedBulkOp()
var myDocs = db.collection.find()
var ops = 0
myDocs.forEach(

  function(myDoc) {

    bulk.find({ _id: myDoc._id }).updateOne(
        { 
          $set : {
                "value": parseFloat(myDoc.value),
            } 
        }
    );

    if ((++ops % 1000) === 0){
      bulk.execute();
      bulk = db.collection.initializeUnorderedBulkOp();
    }

  }
)
bulk.execute();

2)第二部分涉及更新数组对象的值,我在

2) The second part involved updating the array object values and I discovered the syntax to do so in the accepted answer on this post. In my case, I knew that there were 24 values in I ran this separately from the first query and the result looked like:

var bulk = db.collection.initializeUnorderedBulkOp()
var myDocs = db.collection.find()
var ops = 0
myDocs.forEach(

  function(myDoc) {

    bulk.find({ _id: myDoc._id }).update(
        { 
          $set : { 
                "combo.0.v": parseFloat(myDoc.combo[0].v),
                "combo.1.v": parseFloat(myDoc.combo[1].v),
                "combo.2.v": parseFloat(myDoc.combo[2].v),
                "combo.3.v": parseFloat(myDoc.combo[3].v),
                "combo.4.v": parseFloat(myDoc.combo[4].v),
                "combo.5.v": parseFloat(myDoc.combo[5].v),
                "combo.6.v": parseFloat(myDoc.combo[6].v),
                "combo.7.v": parseFloat(myDoc.combo[7].v),
                "combo.8.v": parseFloat(myDoc.combo[8].v),
                "combo.9.v": parseFloat(myDoc.combo[9].v),
                "combo.10.v": parseFloat(myDoc.combo[10].v),
                "combo.11.v": parseFloat(myDoc.combo[11].v),
                "combo.12.v": parseFloat(myDoc.combo[12].v),
                "combo.13.v": parseFloat(myDoc.combo[13].v),
                "combo.14.v": parseFloat(myDoc.combo[14].v),
                "combo.15.v": parseFloat(myDoc.combo[15].v),
                "combo.16.v": parseFloat(myDoc.combo[16].v),
                "combo.17.v": parseFloat(myDoc.combo[17].v),
                "combo.18.v": parseFloat(myDoc.combo[18].v),
                "combo.19.v": parseFloat(myDoc.combo[19].v),
                "combo.20.v": parseFloat(myDoc.combo[20].v),
                "combo.21.v": parseFloat(myDoc.combo[21].v),
                "combo.22.v": parseFloat(myDoc.combo[22].v),
                "combo.23.v": parseFloat(myDoc.combo[23].v)
          }
        }
    );

    if ((++ops % 1000) === 0){
      bulk.execute();
      bulk = db.collection.initializeUnorderedBulkOp();
    }

  }
)
bulk.execute();

仅给出一个有关性能的想法,forEach每分钟要处理约900个文档,实际上,要花费1500万条记录,要花几天的时间!不仅如此,这还只是在文档级别而不是数组级别转换类型.为此,我将不得不遍历每个文档并遍历每个数组(1500万x 24次迭代)!通过这种方法(并排运行两个查询),它可以在6小时内完成两个操作.

Just to give an idea regarding performance, the forEach was going through around 900 documents a minute, which for 15 million records would have taken days, literally! Not only that but this was only converting the types at the document level, not the array level. For that, I would have to loop through each document and loop through each array (15 million x 24 iterations)! With this approach (running both queries side by side), it completed both in under 6 hours.

我希望这对其他人有帮助.

I hope this helps someone else.

这篇关于MongoDB Shell中的无序批量更新记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆