MongoDB从现有字段计算分数并将其放入同一集合中的新字段 [英] MongoDB calculating score from existing fields and putting it into a new field in the same collection
问题描述
我正在开发 Mongodb,我有一个集合,比如说 Collection1
.
我必须从 Collection1
中的现有字段计算分数,并将结果放入 Collection1
中的新字段 Field8
.>
集合 1:
db.Collection1.find().pretty().limit(2) {_id":ObjectId(5717a5d4578f3f2556f300f2"),字段 1":XXXX",字段2":0,Field3":169,Field4":230,"Field5": "...4.67",//该字段指的是一周中的天数Field6":ZZ",Field7":LO"}, {_id":ObjectId(17a5d4575f300f278f3f2556"),字段1":YYYY",字段2":1,Field3":260,Field4":80,"Field5":"1.3....",//这个字段指的是一周中的天数字段6":YY",Field7":PK"}
所以,我必须使用以下公式对我的第一个集合的字段进行一些计算,但我不知道如何进行?:
Score = C1*C2*C3*C4C1 = 10 + 0.03*field3C2 = 1 或 0.03 如果它等于 1 或 0,则取决于 field2C3 = 1 或 2 .... 或 7,它取决于字段 5,例如 C3 对于该文档Field5":...4.67";应该返回 3,这意味着每周三天C4 = 1 或 field4^-0.6 如果 field2 等于 0 或 1
计算这个分数后,我应该把它放在我的 Collection1
的新字段 Field8
中,然后得到这样的东西:
db.Collection1.find().pretty().limit(2) {_id":ObjectId(5717a5d4578f3f2556f300f2"),字段 1":XXXX",字段2":0,Field3":169,Field4":230,"Field5": "...4.67",//该字段指的是一周中的天数Field6":ZZ",Field7":LO",Field8":Score//我计算的分数}, {_id":ObjectId(17a5d4575f300f278f3f2556"),字段1":YYYY",字段2":1,Field3":260,Field4":80,"Field5":"1.3....",//这个字段指的是一周中的天数字段6":YY",Field7":PK",Field8":Score//我计算的分数}
我如何才能实现上述目标?
根据您的应用需求,您可以使用聚合框架来计算分数并使用bulkWrite()
更新您的收藏.考虑以下使用 $project
管道步骤作为使用算术运算符进行分数计算的余地.
因为在您的问题中计算 C3
的逻辑是从 1
到 7
获取一个数字,它正好等于 7 - number of点(.)
,我能想到的唯一可行的方法是在进行聚合之前存储一个额外的字段,该字段首先保存该值.因此,您的第一步是创建该额外字段,您可以使用 bulkWrite()
如下:
第 1 步:修改架构以容纳额外的 daysInWeek
字段
var counter = 0, bulkUpdateOps = [];db.collection1.find({"Field5": { "$exists": true }}).forEach(函数(文档){//获取 Field5 中点数的计算var 点,daysInWeek;点 = (doc.Field5.match(new RegExp(".", "g")) || []).length;daysInWeek = 7 - 点数;批量更新操作.push({更新一":{过滤器":{_id":doc._id},更新": {"$set": { "daysInWeek": daysInWeek }}}});计数器++;如果(计数器 % 500 == 0){db.collection1.bulkWrite(bulkUpdateOps);批量更新操作 = [];}});if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps);}
理想情况下,上述操作还可以计算问题中的其他常量,从而创建 Field8
结果.但是我相信像这样的计算应该在客户端完成,让 MongoDB 在服务器上做它最擅长的事情.
第 2 步:使用聚合添加 Field8
字段
创建额外的字段 daysInWeek
之后,您可以构建一个聚合管道,使用 算术运算符 进行计算(同样,建议在应用程序层进行此类计算).最终的投影将是计算字段的乘积,然后您可以使用聚合结果游标迭代并将 Field8
添加到每个文档的集合中:
var 管道 = [{$项目":{C1":{"$add": [10、{ "$multiply": [ "$Field3", 0.03 ] }]},C2":{$cond":[{ "$eq": [ "$Field2", 1 ] },1、0.03]},"C3": "$daysInWeek",C4":{$cond":[{ "$eq": [ "$Field2", 1 ] },{ "$pow": [ "$Field4", -0.6 ] },1]}}},{$项目":{"Field8": { "$multiply": [ "$C1", "$C2", "$C3", "$C4" ] }}}],计数器 = 0,批量更新操作 = [];db.collection1.aggregate(pipeline).forEach(function(doc) {批量更新操作.push({更新一":{过滤器":{_id":doc._id},更新": {"$set": { "Field8": doc.Field8 }}}});计数器++;如果(计数器 % 500 == 0){db.collection1.bulkWrite(bulkUpdateOps);批量更新操作 = [];}});if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps);}
<小时>
对于 MongoDB >= 2.6
和 <=3.0
,使用 Bulk Opeartions API,您需要在其中使用游标的 forEach()
方法,更新集合中的每个文档.
上述聚合管道中的某些算术运算符在 MongoDB >= 2.6
和 <= 3.0
中不可用,因此您需要在forEach()
迭代.
使用批量 API 通过批量捆绑每个更新并在集合中每 500 个文档中仅发送一次到服务器进行处理来减少服务器写入请求:
var bulkUpdateOps = db.collection1.initializeUnorderedBulkOp(),cursor = db.collection1.find(),//游标计数器 = 0;cursor.forEach(函数(文档){//计算var c1, c2, c3, c4, Field8;c1 = 10 + (0.03*doc.Field3);c2 = (doc.Field2 == 1) ?1:0.03;c3 = 7 - (doc.Field5.match(new RegExp(".", "g")) || []).length;c4 = (doc.Field2 == 1) ?Math.pow(doc.Field, -0.6) : 1;Field8 = c1*c2*c3*c4;bulkUpdateOps.find({ "_id": doc._id }).updateOne({"$set": { "Field8": Field8 }});如果(计数器 % 500 == 0){批量更新Ops.execute();批量更新操作 = db.collection1.initializeUnorderedBulkOp();}})if (counter % 500 != 0) { bulkUpdateOps.execute();}
I'm working on Mongodb and I have one collection, let's say Collection1
.
I have to calculate a score from existing fields in Collection1
, and put the result into a new field Field8
in Collection1
.
Collection1 :
db.Collection1.find().pretty().limit(2) {
"_id": ObjectId("5717a5d4578f3f2556f300f2"),
"Field1": "XXXX",
"Field2": 0,
"Field3": 169,
"Field4": 230,
"Field5": "...4.67", // This field refer to days in a week
"Field6": "ZZ",
"Field7": "LO"
}, {
"_id": ObjectId("17a5d4575f300f278f3f2556"),
"Field1": "YYYY",
"Field2": 1,
"Field3": 260,
"Field4": 80,
"Field5": "1.3....", // This field refer to days in a week
"Field6": "YY",
"Field7": "PK"
}
So, I have to do some calculations to my first collection's fields with the following formula, but I don't know how to proceed ? :
Score = C1*C2*C3*C4
C1 = 10 + 0.03*field3
C2 = 1 or 0.03 it depends on field2 if it equals 1 or 0
C3 = 1 or 2 .... or 7, it depends on field5 for example C3 for this document "Field5": "...4.67" should return 3, it means three days per week
C4 = 1 or field4^-0.6 if field2 equals 0 or 1
After calculating this score I should put it in new field Field8
in my Collection1
and get something just like this :
db.Collection1.find().pretty().limit(2) {
"_id": ObjectId("5717a5d4578f3f2556f300f2"),
"Field1": "XXXX",
"Field2": 0,
"Field3": 169,
"Field4": 230,
"Field5": "...4.67", // This field refer to days in a week
"Field6": "ZZ",
"Field7": "LO",
"Field8": Score // My calculated score
}, {
"_id": ObjectId("17a5d4575f300f278f3f2556"),
"Field1": "YYYY",
"Field2": 1,
"Field3": 260,
"Field4": 80,
"Field5": "1.3....", // This field refer to days in a week
"Field6": "YY",
"Field7": "PK",
"Field8": Score // My calculated score
}
How can I achieve the above?
Depending on your application needs, you can use the aggregation framework for calculating the score and use the bulkWrite()
to update your collection. Consider the following example which uses the $project
pipeline step as leeway for the score calculations with the arithmetic operators.
Since logic for calculating C3
in your question is getting a number from 1
to 7
which equals exactly 7 - number of points (.)
, the only feasible approach I can think of is to store an extra field that holds this value first before doing the aggregation. So your first step would be to create that extra field and you can go about it using the bulkWrite()
as follows:
Step 1: Modify schema to accomodate extra daysInWeek
field
var counter = 0, bulkUpdateOps = [];
db.collection1.find({
"Field5": { "$exists": true }
}).forEach(function(doc) {
// calculations for getting the number of points in Field5
var points, daysInWeek;
points = (doc.Field5.match(new RegExp(".", "g")) || []).length;
daysInWeek = 7 - points;
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": { "daysInWeek": daysInWeek }
}
}
});
counter++;
if (counter % 500 == 0) {
db.collection1.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps); }
Ideally the above operation can also accomodate calculating the other constants in your question and therefore creating the Field8
as a result. However I believe computations like this should be done on the client and let MongoDB do what it does best on the server.
Step 2: Use aggregate to add Field8
field
Having created that extra field daysInWeek
you can then construct an aggregation pipeline that projects the new variables using a cohort of arithmetic operators to do the computation (again, would recommend doing such computations on the application layer). The final projection will be the product of the computed fields which you can then use the aggregate result cursor to iterate and add Field8
to the collection with each document:
var pipeline = [
{
"$project": {
"C1": {
"$add": [
10,
{ "$multiply": [ "$Field3", 0.03 ] }
]
},
"C2": {
"$cond": [
{ "$eq": [ "$Field2", 1 ] },
1,
0.03
]
},
"C3": "$daysInWeek",
"C4": {
"$cond": [
{ "$eq": [ "$Field2", 1 ] },
{ "$pow": [ "$Field4", -0.6 ] },
1
]
}
}
},
{
"$project": {
"Field8": { "$multiply": [ "$C1", "$C2", "$C3", "$C4" ] }
}
}
],
counter = 0,
bulkUpdateOps = [];
db.collection1.aggregate(pipeline).forEach(function(doc) {
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": { "Field8": doc.Field8 }
}
}
});
counter++;
if (counter % 500 == 0) {
db.collection1.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (counter % 500 != 0) { db.collection1.bulkWrite(bulkUpdateOps); }
For MongoDB >= 2.6
and <= 3.0
, use the Bulk Opeartions API where you need to iterate the collection using the cursor's forEach()
method, update each document in the collection.
Some of the arithmetic operators from the above aggregation pipeline are not available in MongoDB >= 2.6
and <= 3.0
so you will need to do the computations within the forEach()
iteration.
Use the bulk API to reduce server write requests by bundling each update in bulk and sending to the server only once in every 500 documents in the collection for processing:
var bulkUpdateOps = db.collection1.initializeUnorderedBulkOp(),
cursor = db.collection1.find(), // cursor
counter = 0;
cursor.forEach(function(doc) {
// computations
var c1, c2, c3, c4, Field8;
c1 = 10 + (0.03*doc.Field3);
c2 = (doc.Field2 == 1) ? 1: 0.03;
c3 = 7 - (doc.Field5.match(new RegExp(".", "g")) || []).length;
c4 = (doc.Field2 == 1) ? Math.pow(doc.Field, -0.6) : 1;
Field8 = c1*c2*c3*c4;
bulkUpdateOps.find({ "_id": doc._id }).updateOne({
"$set": { "Field8": Field8 }
});
if (counter % 500 == 0) {
bulkUpdateOps.execute();
bulkUpdateOps = db.collection1.initializeUnorderedBulkOp();
}
})
if (counter % 500 != 0) { bulkUpdateOps.execute(); }
这篇关于MongoDB从现有字段计算分数并将其放入同一集合中的新字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!