MongoDB,MapReduce和排序 [英] MongoDB, MapReduce and sorting

查看:225
本文介绍了MongoDB,MapReduce和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于我仍在学习MongoDB的来龙去脉,因此我可能对此有些犹豫,但这是可行的.

I might be a bit in over my head on this as I'm still learning the ins and outs of MongoDB, but here goes.

现在,我正在使用一种工具来搜索/过滤数据集,然后按任意数据点对其进行排序(例如,流行度),然后按ID对其进行分组.我唯一能做到这一点的方法是通过Mongo的MapReduce功能.

Right now I'm working on a tool to search/filter through a dataset, sort it by an arbitrary datapoint (eg. popularity) and then group it by an id. The only way I see I can do this is through Mongo's MapReduce functionality.

我不能使用.group(),因为我正在使用10,000多个键,并且还需要能够对数据集进行排序.

I can't use .group() because I'm working with more than 10,000 keys and I also need to be able to sort the dataset.

我的MapReduce代码工作正常,除了一件事:排序.排序根本不起作用.

My MapReduce code is working just fine, except for one thing: sorting. Sorting just doesn't want to work at all.

db.runCommand({
  'mapreduce': 'products',
  'map': function() {
    emit({
      product_id: this.product_id,
      popularity: this.popularity
    }, 1);
  },
  'reduce': function(key, values) {
    var sum = 0;
    values.forEach(function(v) {
      sum += v;
    });

    return sum;  
  },
  'query': {category_id: 20},
  'out': {inline: 1},
  'sort': {popularity: -1}
});

我已经在受欢迎程度数据点上具有降序索引,因此由于缺少索引,它肯定不起作用:

I already have a descending index on the popularity datapoint, so it's definitely not working because of a lack of that:

{ 
  "v" : 1, 
  "key" : { "popularity" : -1 }, 
  "ns" : "app.products", 
  "name" : "popularity_-1" 
}

我只是无法弄清为什么它不想排序.

I just cannot figure out why it doesn't want to sort.

由于无法使用结果集,因此我无法将其输出到另一个集合,然后在该集合上运行.find().sort({popularity:-1}),因为该功能的工作方式.

Instead of inlining the result set, I can't output it to another collection and then run a .find().sort({popularity: -1}) on that because of the way this feature is going to work.

推荐答案

首先,Mongo map/reduce并非旨在用作查询工具(就像在CouchDB中一样),它是为您设计的.运行后台任务.我在工作时使用它来分析路况数据.

First of all, Mongo map/reduce are not designed to be used in as a query tool (as it is in CouchDB), it is design for you to run background tasks. I use it at work to analyze traffic data.

但是,您做错的是您将sort()应用于输入,但这是没有用的,因为完成map()阶段后,中间文档将按每个keys进行排序.由于您的密钥是文档,因此按product_idpopularity进行排序.

What you are doing wrong however is that you're applying the sort() to your input, but it is useless because when the map() stage is done the intermediate documents are sorted by each keys. Because your key is a document, it is being sort by product_id, popularity.

这就是我生成数据集的方式

This is how I generated my dataset

function generate_dummy_data() {
    for (i=2; i < 1000000; i++) { 
        db.foobar.save({
          _id: i, 
         category_id: parseInt(Math.random() * 30), 
         popularity:    parseInt(Math.random() * 50)
        }) 
    }
}

这是我的地图/缩小任务:

And this my map/reduce task:

var data = db.runCommand({
  'mapreduce': 'foobar',
  'map': function() {
    emit({
      sorting: this.popularity * -1,
      product_id: this._id,
      popularity: this.popularity,
    }, 1);
  },
  'reduce': function(key, values) {
    var sum = 0;
    values.forEach(function(v) {
      sum += v;
    });

    return sum;  
  },
  'query': {category_id: 20},
  'out': {inline: 1},
});

这是最终结果(很长要在此处粘贴):

And this is the end result (very long to paste it here):

http://cesarodas.com/results.txt

之所以起作用,是因为现在我们按sorting, product_id, popularity进行排序.您可以随心所欲地进行排序,只要记住最终排序是按key进行的,而与您对输入的排序方式无关.

This works because now we're sorting by sorting, product_id, popularity. You can play with the sorting how ever you like just remember that the final sorting is by key regardless of you how your input is sorted.

无论如何,正如我之前说过的,应该避免使用Map/Reduce进行查询,它是为后台处理而设计的.如果我是您,那么我将以一种可以通过简单查询访问数据的方式来设计数据,在这种情况下,总是需要权衡复杂的插入/更新以拥有简单查询(这就是我看到MongoDB的方式).

Anyway as I said before you should avoid doing queries with Map/Reduce it was designed for background processing. If I were you I would design my data in such a way I could access it with simple queries, there is always a trade-off in this case complex insert/updates to have simple queries (that's how I see MongoDB).

这篇关于MongoDB,MapReduce和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆