通过拆分字段值来重塑文档 [英] Reshape documents by splitting a field value

查看:12
本文介绍了通过拆分字段值来重塑文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一组原始数据:

Suppose we have a collection of raw data:

{ "person": "David, age 102"}
{ "person": "Max, age 8" }

我们想将该集合转换为:

and we'd like to transform that collection to:

{ "age": 102 }
{ "age": 8 }

仅使用 mongo(d) 引擎.(如果所有人名或年龄的长度相等, $substr 可以完成这项工作,)有可能吗?

using only mongo(d) engine. (If all person names or ages had equal lengths, $substr could do the job, ) Is it possible?

假设正则表达式是微不足道的/d+/

Suppose regex is trivial /d+/

推荐答案

MongoDB 3.4版本中的最优方式.

此版本的 mongod 提供 $split 运算符,它当然会拆分字符串,如此处所示.

然后我们使用 $let 变量运算符.然后可以在 in 表达式中使用新值,以使用 $arrayElemAt 运算符返回指定索引处的元素;0 表示第一个元素,-1 表示最后一个元素.

We then assign the the newly computed value to a variable using the $let variable operator. The new value can then be use in the in expression to return the "name" and the "age" values using the $arrayElemAt operator to return the element at a specified index; 0 for the first element and -1 for the last element.

请注意,在 in 表达式中,我们需要拆分最后一个元素才能返回整数字符串.

Note that in the in expression we need to split the last element in order to return the string of integer.

最后我们需要迭代 Cursor 对象并使用 NumberparseInt 并使用批量操作和 bulkWrite() 方法到 $set 这些字段的值以获得最大效率.

Finally we need to iterate the Cursor object and cast the convert the string of integer to numeric using Number or parseInt and use bulk operation and the bulkWrite() method to $set the value for those field for maximum efficiency.

let requests = [];
db.coll.aggregate(
    [
        { "$project": {  
            "person": { 
                "$let": { 
                    "vars": { 
                        "infos":  { "$split": [ "$person", "," ] } 
                    }, 
                    "in": { 
                        "name": { "$arrayElemAt": [ "$$infos", 0 ] }, 
                        "age": { 
                            "$arrayElemAt": [ 
                                { "$split": [ 
                                    { "$arrayElemAt": [ "$$infos", -1 ] }, 
                                    " " 
                                ]}, 
                                -1 
                            ] 
                        } 
                    } 
                } 
            }  
        }}
    ] 
).forEach(document => { 
    requests.push({ 
        "updateOne": { 
            "filter": { "_id": document._id }, 
            "update": { 
                "$set": { 
                    "name": document.person.name, 
                    "age": Number(document.person.age) 
                },
                "$unset": { "person": " " }
            } 
        } 
    }); 
    if ( requests.length === 500 ) { 
        // Execute per 500 ops and re-init
        db.coll.bulkWrite(requests); 
        requests = []; 
    }} 
);

 // Clean up queues
if(requests.length > 0) {
    db.coll.bulkWrite(requests);
}

<小时>

MongoDB 3.2 或更新版本.

MongoDB 3.2 弃用了旧的 Bulk() API 及其相关的方法 并提供bulkWrite() 方法,但它不提供 $split 运算符,因此我们这里唯一的选择是使用 mapReduce() 方法来转换我们的数据,然后使用批量操作更新集合.


MongoDB 3.2 or newer.

MongoDB 3.2 deprecates the old Bulk() API and its associated methods and provides the bulkWrite() method but it doesn't provide the $split operator so the only option we have here is to use the mapReduce() method to transform our data then update the collection using bulk operation.

var mapFunction = function() { 
    var person = {}, 
    infos = this.person.split(/[,s]+/); 
    person["name"] = infos[0]; 
    person["age"] = infos[2]; 
    emit(this._id, person); 
};

var results = db.coll.mapReduce(
    mapFunction, 
    function(key, val) {}, 
    { "out": { "inline": 1 } }
)["results"];

results.forEach(document => { 
    requests.push({ 
        "updateOne": { 
            "filter": { "_id": document._id }, 
            "update": { 
                "$set": { 
                    "name": document.value.name, 
                    "age": Number(document.value.age) 
                }, 
                "$unset": { "person": " " }
            } 
        } 
    }); 
    if ( requests.length === 500 ) { 
        // Execute per 500 operations and re-init
        db.coll.bulkWrite(requests); 
        requests = []; 
    }} 
);

// Clean up queues
if(requests.length > 0) {
    db.coll.bulkWrite(requests);
}

<小时>

MongoDB 版本 2.6 或 3.0.

我们需要使用现已弃用的 Bulk API.p>

var bulkOp = db.coll.initializeUnorderedBulkOp();
var count = 0;

results.forEach(function(document) { 
    bulkOp.find({ "_id": document._id}).updateOne(
        { 
            "$set": { 
                "name": document.value.name, 
                "age": Number(document.value.age)
            },
            "$unset": { "person": " " }
        }
    );
    count++;
    if (count === 500 ) {
        // Execute per 500 operations and re-init
        bulkOp.execute();
        bulkOp = db.coll.initializeUnorderedBulkOp();
    }
});

// clean up queues
if (count > 0 ) {
    bulkOp.execute();
}

这篇关于通过拆分字段值来重塑文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆