MongoDB、MapReduce 和排序 [英] MongoDB, MapReduce and sorting

查看:19
本文介绍了MongoDB、MapReduce 和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可能对此有点不知所措,因为我仍在学习 MongoDB 的来龙去脉,但这里是.

I might be a bit in over my head on this as I'm still learning the ins and outs of MongoDB, but here goes.

现在我正在开发一个工具来搜索/过滤数据集,按任意数据点(例如流行度)对其进行排序,然后按 id 对其进行分组.我认为可以做到这一点的唯一方法是通过 Mongo 的 MapReduce 功能.

Right now I'm working on a tool to search/filter through a dataset, sort it by an arbitrary datapoint (eg. popularity) and then group it by an id. The only way I see I can do this is through Mongo's MapReduce functionality.

我无法使用 .group(),因为我正在处理超过 10,000 个键,而且我还需要能够对数据集进行排序.

I can't use .group() because I'm working with more than 10,000 keys and I also need to be able to sort the dataset.

我的 MapReduce 代码运行良好,除了一件事:排序.排序根本不想工作.

My MapReduce code is working just fine, except for one thing: sorting. Sorting just doesn't want to work at all.

db.runCommand({
  'mapreduce': 'products',
  'map': function() {
    emit({
      product_id: this.product_id,
      popularity: this.popularity
    }, 1);
  },
  'reduce': function(key, values) {
    var sum = 0;
    values.forEach(function(v) {
      sum += v;
    });

    return sum;  
  },
  'query': {category_id: 20},
  'out': {inline: 1},
  'sort': {popularity: -1}
});

我已经在流行度数据点上有了一个降序索引,所以它肯定不会因为缺少它而工作:

I already have a descending index on the popularity datapoint, so it's definitely not working because of a lack of that:

{ 
  "v" : 1, 
  "key" : { "popularity" : -1 }, 
  "ns" : "app.products", 
  "name" : "popularity_-1" 
}

我就是不明白为什么它不想排序.

I just cannot figure out why it doesn't want to sort.

我不能将结果集内联,而是将其输出到另一个集合,然后在其上运行 .find().sort({popularity: -1}) ,因为此功能的工作方式.

Instead of inlining the result set, I can't output it to another collection and then run a .find().sort({popularity: -1}) on that because of the way this feature is going to work.

推荐答案

首先,Mongo map/reduce 不是设计来用作查询工具的(就像在 CouchDB 中一样),它是为您设计的运行后台任务.我在工作中用它来分析交通数据.

First of all, Mongo map/reduce are not designed to be used in as a query tool (as it is in CouchDB), it is design for you to run background tasks. I use it at work to analyze traffic data.

然而,您做错的是将 sort() 应用于您的输入,但这是无用的,因为当 map() 阶段完成后,中间文档按每个.因为您的密钥是一个文档,所以它是按 product_idpopularity 排序的.

What you are doing wrong however is that you're applying the sort() to your input, but it is useless because when the map() stage is done the intermediate documents are sorted by each keys. Because your key is a document, it is being sort by product_id, popularity.

这就是我生成数据集的方式

This is how I generated my dataset

function generate_dummy_data() {
    for (i=2; i < 1000000; i++) { 
        db.foobar.save({
          _id: i, 
         category_id: parseInt(Math.random() * 30), 
         popularity:    parseInt(Math.random() * 50)
        }) 
    }
}

这是我的 map/reduce 任务:

And this my map/reduce task:

var data = db.runCommand({
  'mapreduce': 'foobar',
  'map': function() {
    emit({
      sorting: this.popularity * -1,
      product_id: this._id,
      popularity: this.popularity,
    }, 1);
  },
  'reduce': function(key, values) {
    var sum = 0;
    values.forEach(function(v) {
      sum += v;
    });

    return sum;  
  },
  'query': {category_id: 20},
  'out': {inline: 1},
});

这是最终的结果(很长的时间把它贴在这里):

And this is the end result (very long to paste it here):

http://cesarodas.com/results.txt

这是有效的,因为现在我们按sorting、product_id、popularity进行排序.您可以随意进行排序,只要记住最终排序是通过 key 进行的,而不管您的输入是如何排序的.

This works because now we're sorting by sorting, product_id, popularity. You can play with the sorting how ever you like just remember that the final sorting is by key regardless of you how your input is sorted.

无论如何,正如我之前所说,您应该避免使用 Map/Reduce 进行查询,它是为后台处理而设计的.如果我是你,我会以这样一种方式设计我的数据,我可以通过简单的查询访问它,在这种情况下,复杂的插入/更新总是需要权衡才能进行简单的查询(这就是我对 MongoDB 的看法).

Anyway as I said before you should avoid doing queries with Map/Reduce it was designed for background processing. If I were you I would design my data in such a way I could access it with simple queries, there is always a trade-off in this case complex insert/updates to have simple queries (that's how I see MongoDB).

这篇关于MongoDB、MapReduce 和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆