通过拆分字段值重塑文档 [英] Reshape documents by splitting a field value

查看:48
本文介绍了通过拆分字段值重塑文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有原始数据的收集:

Suppose we have a collection of raw data:

{ "person": "David, age 102"}
{ "person": "Max, age 8" }

,我们希望将该集合转换为:

and we'd like to transform that collection to:

{ "age": 102 }
{ "age": 8 }

仅使用mongo(d)引擎. (如果所有人的姓名或年龄都一样长,则$ substr可以胜任此工作,)可以吗?

using only mongo(d) engine. (If all person names or ages had equal lengths, $substr could do the job, ) Is it possible?

假设正则表达式是琐碎的/\ d +/

Suppose regex is trivial /\d+/

推荐答案

MongoDB 3.4版中的最佳方法.

此版本的mongod提供 $split 当然,该操作符会按此处所示的方式拆分字符串.

The optimal way in MongoDB version 3.4.

This version of mongod provides the $split operator which, of course split the string as shown here.

然后,我们使用 $let 变量运算符.然后可以在 in 表达式中使用新值,以使用

We then assign the the newly computed value to a variable using the $let variable operator. The new value can then be use in the in expression to return the "name" and the "age" values using the $arrayElemAt operator to return the element at a specified index; 0 for the first element and -1 for the last element.

请注意,在 in 表达式中,我们需要拆分最后一个元素,以便返回整数字符串.

Note that in the in expression we need to split the last element in order to return the string of integer.

最后,我们需要迭代Cursor对象,并使用 parseInt 并使用批量操作和 $set 这些字段的值以实现最大效率.

Finally we need to iterate the Cursor object and cast the convert the string of integer to numeric using Number or parseInt and use bulk operation and the bulkWrite() method to $set the value for those field for maximum efficiency.

let requests = [];
db.coll.aggregate(
    [
        { "$project": {  
            "person": { 
                "$let": { 
                    "vars": { 
                        "infos":  { "$split": [ "$person", "," ] } 
                    }, 
                    "in": { 
                        "name": { "$arrayElemAt": [ "$$infos", 0 ] }, 
                        "age": { 
                            "$arrayElemAt": [ 
                                { "$split": [ 
                                    { "$arrayElemAt": [ "$$infos", -1 ] }, 
                                    " " 
                                ]}, 
                                -1 
                            ] 
                        } 
                    } 
                } 
            }  
        }}
    ] 
).forEach(document => { 
    requests.push({ 
        "updateOne": { 
            "filter": { "_id": document._id }, 
            "update": { 
                "$set": { 
                    "name": document.person.name, 
                    "age": Number(document.person.age) 
                },
                "$unset": { "person": " " }
            } 
        } 
    }); 
    if ( requests.length === 500 ) { 
        // Execute per 500 ops and re-init
        db.coll.bulkWrite(requests); 
        requests = []; 
    }} 
);

 // Clean up queues
if(requests.length > 0) {
    db.coll.bulkWrite(requests);
}


MongoDB 3.2或更高版本.

MongoDB 3.2不推荐使用旧版 Bulk() API和其相关的方法,并提供了bulkWrite()方法,但它没有不提供$split运算符,因此我们这里唯一的选择是使用


MongoDB 3.2 or newer.

MongoDB 3.2 deprecates the old Bulk() API and its associated methods and provides the bulkWrite() method but it doesn't provide the $split operator so the only option we have here is to use the mapReduce() method to transform our data then update the collection using bulk operation.

var mapFunction = function() { 
    var person = {}, 
    infos = this.person.split(/[,\s]+/); 
    person["name"] = infos[0]; 
    person["age"] = infos[2]; 
    emit(this._id, person); 
};

var results = db.coll.mapReduce(
    mapFunction, 
    function(key, val) {}, 
    { "out": { "inline": 1 } }
)["results"];

results.forEach(document => { 
    requests.push({ 
        "updateOne": { 
            "filter": { "_id": document._id }, 
            "update": { 
                "$set": { 
                    "name": document.value.name, 
                    "age": Number(document.value.age) 
                }, 
                "$unset": { "person": " " }
            } 
        } 
    }); 
    if ( requests.length === 500 ) { 
        // Execute per 500 operations and re-init
        db.coll.bulkWrite(requests); 
        requests = []; 
    }} 
);

// Clean up queues
if(requests.length > 0) {
    db.coll.bulkWrite(requests);
}


MongoDB版本2.6或3.0.

我们需要使用现已弃用的批量API .


MongoDB version 2.6 or 3.0.

We need to use the now deprecated Bulk API.

var bulkOp = db.coll.initializeUnorderedBulkOp();
var count = 0;

results.forEach(function(document) { 
    bulkOp.find({ "_id": document._id}).updateOne(
        { 
            "$set": { 
                "name": document.value.name, 
                "age": Number(document.value.age)
            },
            "$unset": { "person": " " }
        }
    );
    count++;
    if (count === 500 ) {
        // Execute per 500 operations and re-init
        bulkOp.execute();
        bulkOp = db.coll.initializeUnorderedBulkOp();
    }
});

// clean up queues
if (count > 0 ) {
    bulkOp.execute();
}

这篇关于通过拆分字段值重塑文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆