在mongo中,我如何使用map reduce来按最新顺序排序 [英] In mongo, how do I use map reduce to get a group by ordered by most recent

查看:163
本文介绍了在mongo中,我如何使用map reduce来按最新顺序排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到的map reduce示例使用诸如count之类的聚合函数,但是使用map reduce来获得每个类别中前3个项目的最好方法是什么.

the map reduce examples I see use aggregation functions like count, but what is the best way to get say the top 3 items in each category using map reduce.

我假设我也可以使用group函数,但很好奇,因为它们指出分片环境无法使用group().但是,我实际上也很想看到一个group()示例.

I'm assuming I can also use the group function but was curious since they state sharded environments cannot use group(). However, I'm actually interested in seeing a group() example as well.

推荐答案

为简单起见,我假设您有以下格式的文件:

For the sake of simplification, I'll assume you have documents of the form:

{category: <int>, score: <int>}

我已经创建了1000个文档,涉及100个类别,其中包括:

I've created 1000 documents covering 100 categories with:

for (var i=0; i<1000; i++) {
  db.foo.save({
    category: parseInt(Math.random() * 100),
    score: parseInt(Math.random() * 100)
  });
}

我们的映射器非常简单,只发出类别作为键,然后将一个包含分数数组的对象作为值:

Our mapper is pretty simple, just emit the category as key, and an object containing an array of scores as the value:

mapper = function () {
  emit(this.category, {top:[this.score]});
}

MongoDB的化简器不能返回数组,并且化简器的输出必须与我们emit的值具有相同的类型,因此我们必须将其包装在一个对象中.我们需要一个分数数组,因为这将使我们的减速器能够计算出前3个分数:

MongoDB's reducer cannot return an array, and the reducer's output must be of the same type as the values we emit, so we must wrap it in an object. We need an array of scores, as this will let our reducer compute the top 3 scores:

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

最后,调用map-reduce:

Finally, invoke the map-reduce:

db.foo.mapReduce(mapper, reducer, "top_foos");

现在,我们有一个集合,每个类别包含一个文档,该类别中foo中所有文档的前3个得分最高:

Now we have a collection containing one document per category, and the top 3 scores across all documents from foo in that category:

{ "_id" : 0, "value" : { "top" : [ 93, 89, 86 ] } }
{ "_id" : 1, "value" : { "top" : [ 82, 65, 6 ] } }

(如果您使用与上述相同的Math.random()数据生成器,则您的确切值可能会有所不同)

(Your exact values may vary if you used the same Math.random() data generator as I have above)

您现在可以使用它查询foo那些得分最高的实际文档:

You can now use this to query foo for the actual documents having those top scores:

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});

}

此代码将无法处理关系,或者,如果存在关系,则在find_top_scores生成的最终游标中可能会返回3个以上的文档.

This code won't handle ties, or rather, if ties exist, more than 3 documents might be returned in the final cursor produced by find_top_scores.

使用group的解决方案会有些相似,尽管简化程序一次只需要考虑两个文档,而不是密钥的分数数组.

The solution using group would be somewhat similar, though the reducer will only have to consider two documents at a time, rather than an array of scores for the key.

这篇关于在mongo中,我如何使用map reduce来按最新顺序排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆