基于文档外部包含的属性的MapReduce聚合 [英] MapReduce aggregation based on attributes contained outside of document

查看:80
本文介绍了基于文档外部包含的属性的MapReduce聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个活动"的集合,每个活动都有一个名称,成本和位置:

Say I have a collection of 'activities', each of which has a name, cost and location:

{_id : 1 , name: 'swimming', cost: '3.40', location: 'kirkstall'}
{_id : 2 , name: 'cinema', cost: '6.50', location: 'hyde park'}
{_id : 3 , name: 'gig', cost: '10.00', location: 'hyde park'}

我还有一个people集合,该集合针对每个活动记录他们计划在一年中执行多少次:

I also have a people collection which records, for each activity, how many times they plan to do each in a year:

{_id : 1 , name: 'russell', activities : { {1 : 9} , {2 : 4} , {3 : 21} }}

出于多种原因,我不想通过将活动的属性放到人员集合中来规范化活动.

I don't want to denormalise the activities' attributes by putting them in the person collection for a number of reasons.

首先,这与计划有关,因此,如果一项活动的成本发生变化,则也需要更改人员集合.因此,我必须更新所有人员记录.

First of all, this is about planning, so if the cost of an activity changes, it would need to change in the person collection too. So I'd have to update all person records.

第二,我可能会在某个时候向活动集合添加一些其他属性,并希望避免在执行此操作时将它们添加到人员集合中每个记录的每个活动中.

Secondly, I will probably want to add some other attributes to the activity collection at some point, and want to avoid having to add them to every activity in every record in the person collection when I do.

但是,现在我想做一个MapReduce,以找出按地点分组的所有人总共计划了多少活动.

However, now I want to do a MapReduce to find out how many activities are planned in total by all people, grouped by location.

这意味着在对人员集合进行MapReduce期间,我需要知道他们计划的活动的位置.谁能想到一个不错的方法吗?

This means that during a MapReduce on the person collection I need to know the location of the activities they have planned. Can anyone think of a nice way to do this?

此刻我最好的选择(这很垃圾)是创建一个存储的javascript函数,该函数接受一个activity_ids数组,查询activity集合,然后将activity_id的映射返回到位置.然后将其粘贴在map函数中并从中查找位置.就像我所说的那样,对于activities集合中的同一查询将对people集合中的每个项目都运行一次.

My best shot at the moment (which is pretty rubbish) is creating a stored javascript function that accepts an array of activity_ids, queries the activity collection, and returns a map of activity_id to location. I'd then stick this in the map function and lookup locations from it. This would be pretty rubbish though as I've said as the same query on the activities collection would be run once for every item in the people collection.

推荐答案

我是通过将MapReduce包装在某些存储的javascript中来实现的.

I did this by wrapping the MapReduce in some stored javascript.

function (query) {

  var one = db.people.findOne(query);
  var activity_ids = [];
  for (var k in one.activities){
    activity_ids.push(parseInt(k));
  }

  var activity_location_map = {};
  db.activities.find({id : {$in : activity_ids}}).forEach(function(a){
    activity_location_map[a.id] = a.location;
  });


  return db.people.mapReduce(
    function map(){
      for (var k in this.activities){
        emit({location : activity_location_map[k]} , { total: this.activities[k] });
        emit({location: activity_location_map[k]} , { total: this.activities[k] });
      }
    },
    function reduce(key, values){
      var reduced = {total: 0};
      values.forEach(function(value){
        reduced.total += value.total;
      });

      return reduced;
    },
    {out : {inline: true}, scope : { activity_location_map : activity_location_map }}
  ).results;
}

烦人,凌乱,但效果很好,我想不到更好了.

Annoying, and messy, but it works, and I can't think of owt better.

这篇关于基于文档外部包含的属性的MapReduce聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆