MongoDB MapReduce:超过 1000 条记录无法按预期工作 [英] MongoDB MapReduce: Not working as expected for more than 1000 records

查看:9
本文介绍了MongoDB MapReduce:超过 1000 条记录无法按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个 mapreduce 函数,其中记录以下列格式发出

I wrote a mapreduce function where the records are emitted in the following format

{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}

其中 userid 是键,其余的是该键的值.在 MapReduce 函数之后,我想得到以下格式的结果

where userid is the key and the remaining are the value for that key. After the MapReduce function, I want to get the result in following format

{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}

为了实现这一点,我编写了以下 reduce 函数我知道这可以通过 group by.. 在聚合框架和 mapreduce 中实现,但是对于复杂的场景,我们需要类似的功能.所以,我采用了这种方法.

To acheive this I wrote the following reduce function I know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.

var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
    var notfound=true;
    for(var n = 0; n < result.events.length; n++){
        eventObj = result.events[n];
        for(ev in eventObj){
            if(ev==value.event){
                result.events[n][ev] += value.allEventCount;
                notfound=false;
                break;
            }
        }
    }
    if(notfound==true){ 
        var newEvent={}
        newEvent[value.event]=1; 
        result.events.push(newEvent);
    }
    result.allEventCount += value.allEventCount;
});
return result;

}

这运行完美,当我运行 1000 条记录时,当有 3k 或 10k 条记录时,我得到的结果是这样的

This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this

{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 }, 
{"downgrade" : 2 } ]} }

无法理解此 undefined 来自何处,并且各个事件的总和小于 allEventCount.集合中的所有文档都有非空字段 event 所以没有机会未定义.

Not able to understand where this undefined came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event so there is no chance of undefined.

Mongo DB 版本 -- 2.2.1环境——本地机器,无分片.

Mongo DB version -- 2.2.1 Environment -- Local machine, no sharding.

在reduce函数中,为什么这个操作会失败result.events[n][ev] += value.allEventCount;当类似操作result.allEventCount += value.allEventCount; 通过?

In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount; when the similar operation result.allEventCount += value.allEventCount; passes?

johnyHK建议的正确答案

The corrected answer as suggested by johnyHK

减少功能:

    var reducefn = function(key,values){
    var result = {totEvents:0, event:[]};
    values.forEach(function(value){
        value.event.forEach(function(eventElem){
            var notfound=true;
            for(var n = 0; n < result.event.length; n++){
                eventObj = result.event[n];
                for(ev in eventObj){
                for(evv in eventElem){
                    if(ev==evv){
                        result.event[n][ev] += eventElem[evv];
                        notfound=false;
                        break;
                    }
                }}
            }
            if(notfound==true){ 
                result.event.push(eventElem);
            }
        });
        result.totEvents += value.totEvents;
    });
    return result;
}

推荐答案

您从 map 函数中emit 的对象的形状必须与返回的对象相同来自您的 reduce 函数,因为在处理大量文档(如本例中)时,reduce 的结果可以反馈到 reduce.

The shape of the object you emit from your map function must be the same as the object returned from your reduce function, as the results of a reduce can get fed back into reduce when processing large numbers of docs (like in this case).

所以你需要改变你的 emit 来发出这样的文档:

So you need to change your emit to emit docs like this:

{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}

然后相应地更新您的 reduce 函数.

and then update your reduce function accordingly.

这篇关于MongoDB MapReduce:超过 1000 条记录无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆