在mongo中,我如何使用map reduce来按最新顺序排序 [英] In mongo, how do I use map reduce to get a group by ordered by most recent
问题描述
我看到的map reduce示例使用诸如count之类的聚合函数,但是使用map reduce来获得每个类别中前3个项目的最好方法是什么.
the map reduce examples I see use aggregation functions like count, but what is the best way to get say the top 3 items in each category using map reduce.
我假设我也可以使用group函数,但很好奇,因为它们指出分片环境无法使用group().但是,我实际上也很想看到一个group()示例.
I'm assuming I can also use the group function but was curious since they state sharded environments cannot use group(). However, I'm actually interested in seeing a group() example as well.
推荐答案
为简单起见,我假设您有以下格式的文件:
For the sake of simplification, I'll assume you have documents of the form:
{category: <int>, score: <int>}
我已经创建了1000个文档,涉及100个类别,其中包括:
I've created 1000 documents covering 100 categories with:
for (var i=0; i<1000; i++) {
db.foo.save({
category: parseInt(Math.random() * 100),
score: parseInt(Math.random() * 100)
});
}
我们的映射器非常简单,只发出类别作为键,然后将一个包含分数数组的对象作为值:
Our mapper is pretty simple, just emit the category as key, and an object containing an array of scores as the value:
mapper = function () {
emit(this.category, {top:[this.score]});
}
MongoDB的化简器不能返回数组,并且化简器的输出必须与我们emit
的值具有相同的类型,因此我们必须将其包装在一个对象中.我们需要一个分数数组,因为这将使我们的减速器能够计算出前3个分数:
MongoDB's reducer cannot return an array, and the reducer's output must be of the same type as the values we emit
, so we must wrap it in an object. We need an array of scores, as this will let our reducer compute the top 3 scores:
reducer = function (key, values) {
var scores = [];
values.forEach(
function (obj) {
obj.top.forEach(
function (score) {
scores[scores.length] = score;
});
});
scores.sort();
scores.reverse();
return {top:scores.slice(0, 3)};
}
最后,调用map-reduce:
Finally, invoke the map-reduce:
db.foo.mapReduce(mapper, reducer, "top_foos");
现在,我们有一个集合,每个类别包含一个文档,该类别中foo
中所有文档的前3个得分最高:
Now we have a collection containing one document per category, and the top 3 scores across all documents from foo
in that category:
{ "_id" : 0, "value" : { "top" : [ 93, 89, 86 ] } }
{ "_id" : 1, "value" : { "top" : [ 82, 65, 6 ] } }
(如果您使用与上述相同的Math.random()
数据生成器,则您的确切值可能会有所不同)
(Your exact values may vary if you used the same Math.random()
data generator as I have above)
您现在可以使用它查询foo
那些得分最高的实际文档:
You can now use this to query foo
for the actual documents having those top scores:
function find_top_scores(categories) {
var query = [];
db.top_foos.find({_id:{$in:categories}}).forEach(
function (topscores) {
query[query.length] = {
category:topscores._id,
score:{$in:topscores.value.top}
};
});
return db.foo.find({$or:query});
}
此代码将无法处理关系,或者,如果存在关系,则在find_top_scores
生成的最终游标中可能会返回3个以上的文档.
This code won't handle ties, or rather, if ties exist, more than 3 documents might be returned in the final cursor produced by find_top_scores
.
使用group
的解决方案会有些相似,尽管简化程序一次只需要考虑两个文档,而不是密钥的分数数组.
The solution using group
would be somewhat similar, though the reducer will only have to consider two documents at a time, rather than an array of scores for the key.
这篇关于在mongo中,我如何使用map reduce来按最新顺序排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!