MongoDB从现有字段计算分数并将其放入同一集合中的新字段 [英] MongoDB calculating score from existing fields and putting it into a new field in the same collection

查看:46
本文介绍了MongoDB从现有字段计算分数并将其放入同一集合中的新字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发 Mongodb,我有一个集合,比如说 Collection1.

我必须从 Collection1 中的现有字段计算分数,并将结果放入 Collection1 中的新字段 Field8.

集合 1:

db.Collection1.find().pretty().limit(2) {_id":ObjectId(5717a5d4578f3f2556f300f2"),字段 1":XXXX",字段2":0,Field3":169,Field4":230,"Field5": "...4.67",//该字段指的是一周中的天数Field6":ZZ",Field7":LO"}, {_id":ObjectId(17a5d4575f300f278f3f2556"),字段1":YYYY",字段2":1,Field3":260,Field4":80,"Field5":"1.3....",//这个字段指的是一周中的天数字段6":YY",Field7":PK"}

所以,我必须使用以下公式对我的第一个集合的字段进行一些计算,但我不知道如何进行?:

Score = C1*C2*C3*C4C1 = 10 + 0.03*field3C2 = 1 或 0.03 如果它等于 1 或 0,则取决于 field2C3 = 1 或 2 .... 或 7,它取决于字段 5,例如 C3 对于该文档Field5":...4.67";应该返回 3,这意味着每周三天C4 = 1 或 field4^-0.6 如果 field2 等于 0 或 1

计算这个分数后,我应该把它放在我的 Collection1 的新字段 Field8 中,然后得到这样的东西:

 db.Collection1.find().pretty().limit(2) {_id":ObjectId(5717a5d4578f3f2556f300f2"),字段 1":XXXX",字段2":0,Field3":169,Field4":230,"Field5": "...4.67",//该字段指的是一周中的天数Field6":ZZ",Field7":LO",Field8":Score//我计算的分数}, {_id":ObjectId(17a5d4575f300f278f3f2556"),字段1":YYYY",字段2":1,Field3":260,Field4":80,"Field5":"1.3....",//这个字段指的是一周中的天数字段6":YY",Field7":PK",Field8":Score//我计算的分数}

我如何才能实现上述目标?

解决方案

根据您的应用需求,您可以使用聚合框架来计算分数并使用bulkWrite() 更新您的收藏.考虑以下使用 $project 管道步骤作为使用算术运算符进行分数计算的余地.

因为在您的问题中计算 C3 的逻辑是从 17 获取一个数字,它正好等于 7 - number of点(.),我能想到的唯一可行的方法是在进行聚合之前存储一个额外的字段,该字段首先保存该值.因此,您的第一步是创建该额外字段,您可以使用 bulkWrite() 如下:

<小时>

第 1 步:修改架构以容纳额外的 daysInWeek 字段

var counter = 0, bulkUpdateOps = [];db.collection1.find({"Field5": { "$exists": true }}).forEach(函数(文档){//获取 Field5 中点数的计算var 点,daysInWeek;点 = (doc.Field5.match(new RegExp(".", "g")) || []).length;daysInWeek = 7 - 点数;批量更新操作.push({更新一":{过滤器":{_id":doc._id},更新": {"$set": { "daysInWeek": daysInWeek }}}});计数器++;如果(计数器 % 500 == 0){db.collection1.bulkWrite(bulkUpdateOps);批量更新操作 = [];}});if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps);}

理想情况下,上述操作还可以计算问题中的其他常量,从而创建 Field8 结果.但是我相信像这样的计算应该在客户端完成,让 MongoDB 在服务器上做它最擅长的事情.

<小时>

第 2 步:使用聚合添加 Field8 字段

创建额外的字段 daysInWeek 之后,您可以构建一个聚合管道,使用 算术运算符 进行计算(同样,建议在应用程序层进行此类计算).最终的投影将是计算字段的乘积,然后您可以使用聚合结果游标迭代并将 Field8 添加到每个文档的集合中:

var 管道 = [{$项目":{C1":{"$add": [10、{ "$multiply": [ "$Field3", 0.03 ] }]},C2":{$cond":[{ "$eq": [ "$Field2", 1 ] },1、0.03]},"C3": "$daysInWeek",C4":{$cond":[{ "$eq": [ "$Field2", 1 ] },{ "$pow": [ "$Field4", -0.6 ] },1]}}},{$项目":{"Field8": { "$multiply": [ "$C1", "$C2", "$C3", "$C4" ] }}}],计数器 = 0,批量更新操作 = [];db.collection1.aggregate(pipeline).forEach(function(doc) {批量更新操作.push({更新一":{过滤器":{_id":doc._id},更新": {"$set": { "Field8": doc.Field8 }}}});计数器++;如果(计数器 % 500 == 0){db.collection1.bulkWrite(bulkUpdateOps);批量更新操作 = [];}});if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps);}

<小时>

对于 MongoDB >= 2.6<=3.0,使用 Bulk Opeartions API,您需要在其中使用游标的 forEach() 方法,更新集合中的每个文档.

上述聚合管道中的某些算术运算符在 MongoDB >= 2.6<= 3.0 中不可用,因此您需要在forEach() 迭代.

使用批量 API 通过批量捆绑每个更新并在集合中每 500 个文档中仅发送一次到服务器进行处理来减少服务器写入请求:

var bulkUpdateOps = db.collection1.initializeUnorderedBulkOp(),cursor = db.collection1.find(),//游标计数器 = 0;cursor.forEach(函数(文档){//计算var c1, c2, c3, c4, Field8;c1 = 10 + (0.03*doc.Field3);c2 = (doc.Field2 == 1) ?1:0.03;c3 = 7 - (doc.Field5.match(new RegExp(".", "g")) || []).length;c4 = (doc.Field2 == 1) ?Math.pow(doc.Field, -0.6) : 1;Field8 = c1*c2*c3*c4;bulkUpdateOps.find({ "_id": doc._id }).updateOne({"$set": { "Field8": Field8 }});如果(计数器 % 500 == 0){批量更新Ops.execute();批量更新操作 = db.collection1.initializeUnorderedBulkOp();}})if (counter % 500 != 0) { bulkUpdateOps.execute();}

I'm working on Mongodb and I have one collection, let's say Collection1.

I have to calculate a score from existing fields in Collection1, and put the result into a new field Field8 in Collection1.

Collection1 :

db.Collection1.find().pretty().limit(2) {
      "_id": ObjectId("5717a5d4578f3f2556f300f2"),
      "Field1": "XXXX",
      "Field2": 0,
      "Field3": 169,
      "Field4": 230,
      "Field5": "...4.67", // This field refer to days in a week
      "Field6": "ZZ",
      "Field7": "LO"
    }, {
      "_id": ObjectId("17a5d4575f300f278f3f2556"),
      "Field1": "YYYY",
      "Field2": 1,
      "Field3": 260,
      "Field4": 80,
      "Field5": "1.3....", // This field refer to days in a week
      "Field6": "YY",
      "Field7": "PK"
    }

So, I have to do some calculations to my first collection's fields with the following formula, but I don't know how to proceed ? :

Score = C1*C2*C3*C4

C1 = 10 + 0.03*field3
C2 = 1 or 0.03 it depends on field2 if it equals 1 or 0
C3 = 1 or 2 .... or 7, it depends on field5 for example C3 for this document "Field5": "...4.67" should return 3, it means three days per week
C4 = 1 or field4^-0.6 if field2 equals 0 or 1

After calculating this score I should put it in new field Field8 in my Collection1 and get something just like this :

 db.Collection1.find().pretty().limit(2) {
          "_id": ObjectId("5717a5d4578f3f2556f300f2"),
          "Field1": "XXXX",
          "Field2": 0,
          "Field3": 169,
          "Field4": 230,
          "Field5": "...4.67", // This field refer to days in a week
          "Field6": "ZZ",
          "Field7": "LO",
          "Field8": Score // My calculated score
        }, {
          "_id": ObjectId("17a5d4575f300f278f3f2556"),
          "Field1": "YYYY",
          "Field2": 1,
          "Field3": 260,
          "Field4": 80,
          "Field5": "1.3....", // This field refer to days in a week
          "Field6": "YY",
          "Field7": "PK",
          "Field8": Score // My calculated score
        }

How can I achieve the above?

解决方案

Depending on your application needs, you can use the aggregation framework for calculating the score and use the bulkWrite() to update your collection. Consider the following example which uses the $project pipeline step as leeway for the score calculations with the arithmetic operators.

Since logic for calculating C3 in your question is getting a number from 1 to 7 which equals exactly 7 - number of points (.), the only feasible approach I can think of is to store an extra field that holds this value first before doing the aggregation. So your first step would be to create that extra field and you can go about it using the bulkWrite() as follows:


Step 1: Modify schema to accomodate extra daysInWeek field

var counter = 0, bulkUpdateOps = [];

db.collection1.find({
    "Field5": { "$exists": true }
}).forEach(function(doc) {
    // calculations for getting the number of points in Field5
    var points, daysInWeek;
    points = (doc.Field5.match(new RegExp(".", "g")) || []).length;
    daysInWeek = 7 - points;
    bulkUpdateOps.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": {
                "$set": { "daysInWeek": daysInWeek }
            }
        }
    });
    counter++;

    if (counter % 500 == 0) {
        db.collection1.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});

if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps); }

Ideally the above operation can also accomodate calculating the other constants in your question and therefore creating the Field8 as a result. However I believe computations like this should be done on the client and let MongoDB do what it does best on the server.


Step 2: Use aggregate to add Field8 field

Having created that extra field daysInWeek you can then construct an aggregation pipeline that projects the new variables using a cohort of arithmetic operators to do the computation (again, would recommend doing such computations on the application layer). The final projection will be the product of the computed fields which you can then use the aggregate result cursor to iterate and add Field8 to the collection with each document:

var pipeline = [
        {
            "$project": {
                "C1": {
                    "$add": [ 
                        10, 
                        { "$multiply": [ "$Field3", 0.03 ] } 
                    ]
                },
                "C2": {
                    "$cond": [
                        { "$eq": [ "$Field2", 1 ] }, 
                        1, 
                        0.03 
                    ]
                },
                "C3": "$daysInWeek",
                "C4": {
                    "$cond": [
                        { "$eq": [ "$Field2", 1 ]  },
                        { "$pow": [ "$Field4", -0.6 ] },
                        1
                    ]
                }
            }
        },
        {
            "$project": {
                "Field8": { "$multiply": [ "$C1", "$C2", "$C3", "$C4" ] }
            }
        }
    ],
    counter = 0,
    bulkUpdateOps = [];

db.collection1.aggregate(pipeline).forEach(function(doc) {
    bulkUpdateOps.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": {
                "$set": { "Field8": doc.Field8 }
            }
        }
    });
    counter++;

    if (counter % 500 == 0) {
        db.collection1.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});

if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps); }


For MongoDB >= 2.6 and <= 3.0, use the Bulk Opeartions API where you need to iterate the collection using the cursor's forEach() method, update each document in the collection.

Some of the arithmetic operators from the above aggregation pipeline are not available in MongoDB >= 2.6 and <= 3.0 so you will need to do the computations within the forEach() iteration.

Use the bulk API to reduce server write requests by bundling each update in bulk and sending to the server only once in every 500 documents in the collection for processing:

var bulkUpdateOps = db.collection1.initializeUnorderedBulkOp(),
    cursor = db.collection1.find(), // cursor 
    counter = 0;

cursor.forEach(function(doc) {
    // computations
    var c1, c2, c3, c4, Field8;
    c1 = 10 + (0.03*doc.Field3);
    c2 = (doc.Field2 == 1) ? 1: 0.03;
    c3 = 7 - (doc.Field5.match(new RegExp(".", "g")) || []).length;
    c4 = (doc.Field2 == 1) ? Math.pow(doc.Field, -0.6) : 1;
    Field8 = c1*c2*c3*c4;

    bulkUpdateOps.find({ "_id": doc._id }).updateOne({
        "$set": { "Field8": Field8 }
    });

    if (counter % 500 == 0) {
        bulkUpdateOps.execute();
        bulkUpdateOps = db.collection1.initializeUnorderedBulkOp();
    }
})

if (counter % 500 != 0) { bulkUpdateOps.execute(); }    

这篇关于MongoDB从现有字段计算分数并将其放入同一集合中的新字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆