Map Reduce上的Reduce功能显示不正确的结果-为什么? [英] Reduce function on Map Reduce showing incorrect results -- why?

查看:104
本文介绍了Map Reduce上的Reduce功能显示不正确的结果-为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据结构,可以跟踪不同城市的人:

I have a data structure that keeps track of people in different cities:

//in db.persons
{
  name: "John",
  city: "Seattle
},
{
  name: "Bill",
  city: "Portland"
}

我想对地图进行简化以获取每个城市有多少人的列表,因此结果将如下所示:

I want to run a map reduce to get a list of how many people are in each city, so the result will look like this:

{
  _id: "Seattle",
  value: 10
}

我的地图缩小功能如下:

My map reduce function looks like this:

map = function(){
  var city = this.city
  emit(city, 1);
};


reduce = function(key, values){
    var result = 0;
    values.forEach(function(value){
      result += 1;
    });
    return result;
}

非常简单的东西,我认为它将city作为键,然后为找到的每个匹配城市在结果中添加一个.但是,在生成的map减少的情况下,该值相差很大.将我的reduce函数切换到:

Very simple stuff, I figured it would take the city as a key, then add one to the result for each matching city it found. However, on the resulting map reduce, the value was off by a large factor. Switching my reduce function to:

reduce = function(key, values){
    var result = 0;
    values.forEach(function(value){
      result += value;
    });
    return result;
}

然后将value添加到结果中(根据我的emit函数的理解,应该为1)返回正确的结果.

And adding the value to the result (which should be 1, as I understand it from my emit function) returned correct results.

为什么结果不同?我的value在reduce函数中不是1吗?

Why are the results different? Wouldn't my value be 1 in the reduce function?

推荐答案

之所以会发生这种情况,是因为MongoDB可以为同一键多次调用reduce函数.这是一个简单的示例:

This happens because MongoDB can invoke the reduce function multiple times for the same key. Here's a simple worked example:

让我们说您的数据库中只有三个文档,每个文档的城市"都相同.在发射阶段之后,您将获得一组发射的对象,它们类似于

Lets say you have just three documents in your database, each with same 'city' of 'Seattle'. After the emit phase, you will have a set of emitted objects which look like

{'Seattle' : 1}. {'Seattle' : 1}. {'Seattle' : 1}

发射阶段完成后,还原阶段开始.在最简单的情况下,reduce函数将称为reduce('Seattle', [1,1,1]).在这种情况下,您的第一个功能将正常工作.但是,reduce函数可能会多次调用:

After the emit phase has completed, the reduce phase starts. In the simplest case, the reduce function will be called as reduce('Seattle', [1,1,1]). In this case, your first function would work correctly. However, the reduce function may be called multiple times:

reduce('Seattle', [1,1]) -> {'Seattle' : 2}, {'Seattle', 1}

reduce('Seattle', [2,1])

在这种情况下,您的第一个reduce函数将在第二次reduce调用之后返回2,因为值列表中有两项.在第二个reduce函数中,您可以将值正确加在一起,而不仅仅是对它们进行计数,从而给出正确的答案.

In this case, your first reduce function would return 2 after the second reduce call as there are two items in the list of values. In your second reduce function, you correctly add the values together rather than just counting them, which gives the correct answer.

我个人认为, CouchDB文档可以更好地解释此原因对它们的值输入数组具有可交换和关联的归约函数.

I personally think that the CouchDB docs explain this slightly better as to why you need to have commutative and associative reduce functions for their array of values input.

这篇关于Map Reduce上的Reduce功能显示不正确的结果-为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆