MongoDB MapReduce:超过1000条记录无法正常工作 [英] MongoDB MapReduce: Not working as expected for more than 1000 records

查看:229
本文介绍了MongoDB MapReduce:超过1000条记录无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个mapreduce函数,其中记录以以下格式发出

I wrote a mapreduce function where the records are emitted in the following format

{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}

其中userid是密钥,其余的是该密钥的值. 在MapReduce函数之后,我想以以下格式获取结果

where userid is the key and the remaining are the value for that key. After the MapReduce function, I want to get the result in following format

{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}

为此,我编写了以下reduce函数 我知道这可以通过聚合框架和mapreduce中的group by ..实现,但是对于复杂的场景,我们需要类似的功能.因此,我正在采用这种方法.

To acheive this I wrote the following reduce function I know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.

var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
    var notfound=true;
    for(var n = 0; n < result.events.length; n++){
        eventObj = result.events[n];
        for(ev in eventObj){
            if(ev==value.event){
                result.events[n][ev] += value.allEventCount;
                notfound=false;
                break;
            }
        }
    }
    if(notfound==true){ 
        var newEvent={}
        newEvent[value.event]=1; 
        result.events.push(newEvent);
    }
    result.allEventCount += value.allEventCount;
});
return result;

}

这运行得很好,当我运行1000条记录时,当有3k或10k条记录时,我得到的结果是这样的

This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this

{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 }, 
{"downgrade" : 2 } ]} }

无法了解此undefined的来源,并且各个事件的总和小于allEventCount.集合中的所有文档都具有非空字段event,因此不会有未定义的可能性.

Not able to understand where this undefined came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event so there is no chance of undefined.

Mongo DB版本-2.2.1 环境-本地计算机,无分片.

Mongo DB version -- 2.2.1 Environment -- Local machine, no sharding.

在reduce函数中,当类似的操作result.allEventCount += value.allEventCount;通过时,为什么该操作失败result.events[n][ev] += value.allEventCount;?

In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount; when the similar operation result.allEventCount += value.allEventCount; passes?

johnyHK建议的更正答案

The corrected answer as suggested by johnyHK

减少功能:

    var reducefn = function(key,values){
    var result = {totEvents:0, event:[]};
    values.forEach(function(value){
        value.event.forEach(function(eventElem){
            var notfound=true;
            for(var n = 0; n < result.event.length; n++){
                eventObj = result.event[n];
                for(ev in eventObj){
                for(evv in eventElem){
                    if(ev==evv){
                        result.event[n][ev] += eventElem[evv];
                        notfound=false;
                        break;
                    }
                }}
            }
            if(notfound==true){ 
                result.event.push(eventElem);
            }
        });
        result.totEvents += value.totEvents;
    });
    return result;
}

推荐答案

map函数返回的emit对象的形状必须与从reduce函数返回的对象相同.处理大量文档时(例如在这种情况下),reduce的代码可以反馈到reduce中.

The shape of the object you emit from your map function must be the same as the object returned from your reduce function, as the results of a reduce can get fed back into reduce when processing large numbers of docs (like in this case).

因此您需要更改您的emit以发出如下文档:

So you need to change your emit to emit docs like this:

{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}

,然后相应地更新您的reduce函数.

and then update your reduce function accordingly.

这篇关于MongoDB MapReduce:超过1000条记录无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆