mongodb聚合查询在使用$ sum时未返回正确的总和 [英] mongodb aggregate query isn't returning proper sum on using $sum
问题描述
我有一组学生,他们的文件格式如下:-
I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
我想获取所有主题=数学"的学生的总成绩.所以我应该得到163.
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
现在我应该得到以下结果-
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
但我明白了
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
有人可以指出我在这里做错了什么吗?
Can someone point out what I might be doing wrong here?
推荐答案
您当前的模式的marks
字段数据类型为字符串,并且您的聚合框架需要整数数据类型才能得出总和.另一方面,您可以使用 MapReduce 计算总和因为它允许在其map函数中的对象属性上使用本机JavaScript方法(例如parseInt()
).因此,总的来说,您有两种选择.
Your current schema has the marks
field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt()
on your object properties in its map functions. So overall you have two choices.
第一个是更改架构或在文档中添加具有实际数字值而不是字符串表示形式的另一个字段.如果收集文档的尺寸较小,则可以组合使用mongodb的光标 find()
, forEach()
和 update()
更改标记方案的方法:
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find()
, forEach()
and update()
methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB版本> = 2.6和< 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB 3.2版及更高版本:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
选项2:运行MapReduce
第二种方法是使用 MapReduce ,您可以在其中使用JavaScript函数parseInt()
.
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt()
.
在您的 MapReduce 操作中,定义用于处理每个输入文档.此函数将每个文档的转换后的marks
字符串值映射到subject
,并发出subject
和转换后的marks
对.这是可以应用JavaScript本机函数parseInt()
的地方.注意:在函数中,this
是指map-reduce操作正在处理的文档:
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks
string value to the subject
for each document, and emits the subject
and converted marks
pair. This is where the JavaScript native function parseInt()
can be applied. Note: in the function, this
refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
接下来,使用两个参数keySubject
和valuesMarks
定义相应的reduce函数. valuesMarks
是一个数组,其元素是由映射函数发出并由keySubject
分组的整数marks
值.
该函数将valuesMarks
数组减少为其元素的总和.
Next, define the corresponding reduce function with two arguments keySubject
and valuesMarks
. valuesMarks
is an array whose elements are the integer marks
values emitted by the map function and grouped by keySubject
.
The function reduces the valuesMarks
array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
对于您的收藏集,以上内容将把您的MapReduce聚合结果放入新的收藏集db.example_results
中.因此,db.example_results.find()
将输出:
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results
. Thus, db.example_results.find()
will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
这篇关于mongodb聚合查询在使用$ sum时未返回正确的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!