MongoDB MapReduce:超过1000条记录无法正常工作 [英] MongoDB MapReduce: Not working as expected for more than 1000 records
问题描述
我写了一个mapreduce函数,其中记录以以下格式发出
I wrote a mapreduce function where the records are emitted in the following format
{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}
其中userid是密钥,其余的是该密钥的值. 在MapReduce函数之后,我想以以下格式获取结果
where userid is the key and the remaining are the value for that key. After the MapReduce function, I want to get the result in following format
{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}
为此,我编写了以下reduce函数 我知道这可以通过聚合框架和mapreduce中的group by ..实现,但是对于复杂的场景,我们需要类似的功能.因此,我正在采用这种方法.
To acheive this I wrote the following reduce function I know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.
var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
var notfound=true;
for(var n = 0; n < result.events.length; n++){
eventObj = result.events[n];
for(ev in eventObj){
if(ev==value.event){
result.events[n][ev] += value.allEventCount;
notfound=false;
break;
}
}
}
if(notfound==true){
var newEvent={}
newEvent[value.event]=1;
result.events.push(newEvent);
}
result.allEventCount += value.allEventCount;
});
return result;
}
这运行得很好,当我运行1000条记录时,当有3k或10k条记录时,我得到的结果是这样的
This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this
{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 },
{"downgrade" : 2 } ]} }
无法了解此undefined
的来源,并且各个事件的总和小于allEventCount.集合中的所有文档都具有非空字段event
,因此不会有未定义的可能性.
Not able to understand where this undefined
came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event
so there is no chance of undefined.
Mongo DB版本-2.2.1 环境-本地计算机,无分片.
Mongo DB version -- 2.2.1 Environment -- Local machine, no sharding.
在reduce函数中,当类似的操作result.allEventCount += value.allEventCount;
通过时,为什么该操作失败result.events[n][ev] += value.allEventCount;
?
In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount;
when the similar operation result.allEventCount += value.allEventCount;
passes?
johnyHK建议的更正答案
The corrected answer as suggested by johnyHK
减少功能:
var reducefn = function(key,values){
var result = {totEvents:0, event:[]};
values.forEach(function(value){
value.event.forEach(function(eventElem){
var notfound=true;
for(var n = 0; n < result.event.length; n++){
eventObj = result.event[n];
for(ev in eventObj){
for(evv in eventElem){
if(ev==evv){
result.event[n][ev] += eventElem[evv];
notfound=false;
break;
}
}}
}
if(notfound==true){
result.event.push(eventElem);
}
});
result.totEvents += value.totEvents;
});
return result;
}
推荐答案
从map
函数返回的emit
对象的形状必须与从reduce
函数返回的对象相同.处理大量文档时(例如在这种情况下),reduce
的代码可以反馈到reduce
中.
The shape of the object you emit
from your map
function must be the same as the object returned from your reduce
function, as the results of a reduce
can get fed back into reduce
when processing large numbers of docs (like in this case).
因此您需要更改您的emit
以发出如下文档:
So you need to change your emit
to emit docs like this:
{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}
,然后相应地更新您的reduce
函数.
and then update your reduce
function accordingly.
这篇关于MongoDB MapReduce:超过1000条记录无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!