你如何找到蒙戈阵列聚集体尺寸大于二? [英] How do you find aggregate in mongo array with size greater than two?

查看:193
本文介绍了你如何找到蒙戈阵列聚集体尺寸大于二?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在蒙戈2.6文档,请参阅下面几个

  NMS:PRIMARY> db.checkpointstest4.find()
{_id:1,CPU:[100,20,60],主机:主机1}
{_id:2,CPU:[40,30,80],主机名:主机1}

我需要找到平均CPU(每个CPU数组索引)每个主机,即基于两个以上,平均为主机1 [70, 25,70] ,因为 CPU [0] 100 + 40 = 70

我失去的时候我有3个数组元素,而不是两个数组元素,请参阅mongodb数组元素的总平均

最后下面为我工作:

  VAR地图=功能(){
    为(VAR IDX = 0; idx的&下; this.cpu.length; IDX ++){
        VAR映射= {
            IDX:IDX,
            VAL:this.cpu [IDX]
        };
        放出(this.hostname,{CPU:映射});
    }
};VAR减少=功能(键,值){    VAR的cpu = []; VAR总和= [0,0,0]; CNT = [0,0,0];
    values​​.forEach(函数(值){
        综上所述[value.cpu.idx] + = value.cpu.val;
        CNT [value.cpu.idx] + = 1;
        CPU [value.cpu.idx] =总和[value.cpu.idx] / CNT [value.cpu.idx]
    });
    返回{CPU:CPU};
};db.checkpointstest4.ma preduce(地图,减少{一声:checkpointstest4_result});


解决方案

作为与<一提到升级将是你最好的选择href=\"https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/#includearrayindex-and-$p$pservenullandemptyarrays\"相对=nofollow> includeArrayIndex 可用的 $放松 从MongoDB的3.2起。

如果你不能做到这一点,那么你可以随时与马preduce,而不是处理:

db.checkpointstest4.ma preduce(
    功能(){
        VAR映射= this.cpu.map(功能(VAL){
            返回{VAL:VAL,CNT:1};
        });
        放出(this.hostname,{CPU:映射});
    },
    功能(键,值){
        VAR的cpu = [];        values​​.forEach(函数(值){
            value.cpu.forEach(函数(项目,IDX){
                如果(CPU [IDX] ==未定义)
                    CPU [IDX] = {VAL0CNT:0};
                CPU [IDX] .VAL + = item.val;
                CPU [IDX] .CNT + = item.cnt
            });
        });
        返回{CPU:CPU};
    },
    {
        走出去:{内联:1},
        定格:功能(键,值){
            返回{
                CPU:value.cpu.map(功能(CPU){
                    返回cpu.val / cpu.cnt;
                 })
            };
        }
    }

因此​​,步骤有在映射功能改造阵列的内容是含有元素和伯爵,供日后参考输入,以减量化功能的价值对象的数组。您需要这是与减速将如何与此工作,有必要获得以获得平均所需要的总体计数一致。

在减速本身你基本上总结为两个价值和伯爵每个位置上的数组内容。这是很重要的降低功能,可以称得上是整个还原过程多次,喂养它的像在后续调用输入的输出。所以这就是为什么既映射器和减速器这种格式都在工作。

通过最后减少的结果,在完成函数被调用来简单看一下每个总结价值和伯爵,并除以计数返回的平均水平。

里程可能会在此MA preduce进程是否现代化的聚集流水线处理,或将确实执行最佳的大多是根据数据而有所不同。在prescribed方式使用 $放松肯定会增加要分析的文件量,从而产生的开销。与此相反,而JavaScript的处理,而不是本地运营商在聚合框架通常会慢一些,但这里的文件的处理开销,因为这是保持阵列被降低。

我会给意见是使用这个,如果升级到3.2是不是一种选择,但如果连一个选项,然后至少标杆两个在您的数据和预期的增长,看看哪个最适合你。


返回

{
        结果:
                {
                        _id:主机1
                        值:{
                                中央处理器 : [
                                        70,
                                        25,
                                        70
                                ]
                        }
                }
        ]
        timeMillis:38,
        罪状:{
                输入:2,
                发出:2,
                减少:1,
                输出:1
        },
        确定:1
}

In the mongo 2.6 document, see few below

nms:PRIMARY> db.checkpointstest4.find()
{ "_id" : 1, "cpu" : [ 100, 20, 60 ], "hostname" : "host1" }
{ "_id" : 2, "cpu" : [ 40, 30, 80 ], "hostname" : "host1" }

I need to find average cpu (per cpu array index) per hosts I.E based on two above, average for host1 will be [70,25,70] because cpu[0] is 100+40=70 etc

I am lost when I have 3 array elements instead of two array elements, see mongodb aggregate average of array elements

Finally below worked for me:

var map = function () {
    for (var idx = 0; idx < this.cpu.length; idx++) {
        var mapped = {
            idx: idx,
            val: this.cpu[idx]
        };
        emit(this.hostname, {"cpu": mapped});
    }
};

var reduce = function (key, values) {

    var cpu = []; var sum = [0,0,0]; cnt = [0,0,0];
    values.forEach(function (value) {        
        sum[value.cpu.idx] += value.cpu.val;
        cnt[value.cpu.idx] +=1;       
        cpu[value.cpu.idx] = sum[value.cpu.idx]/cnt[value.cpu.idx]
    });   
    return {"cpu": cpu};
};

db.checkpointstest4.mapReduce(map, reduce, {out: "checkpointstest4_result"});

解决方案

Upgrading would be your best option as mentioned with the includeArrayIndex available to $unwind from MongoDB 3.2 onwards.

If you cannot do that, then you can always process with mapReduce instead:

db.checkpointstest4.mapReduce(
    function() {
        var mapped = this.cpu.map(function(val) {
            return { "val": val, "cnt": 1 };
        });
        emit(this.hostname,{ "cpu": mapped });
    },
    function(key,values) {
        var cpu = [];

        values.forEach(function(value) {
            value.cpu.forEach(function(item,idx) {
                if ( cpu[idx] == undefined )
                    cpu[idx] = { "val": 0, "cnt": 0 };
                cpu[idx].val += item.val;
                cpu[idx].cnt += item.cnt
            });
        });
        return { "cpu": cpu };
    },
    {
        "out": { "inline": 1 },
        "finalize": function(key,value) {
            return { 
                "cpu": value.cpu.map(function(cpu) {
                    return cpu.val / cpu.cnt;
                 })
            };
        }
    }
)

So the steps there are in the "mapper" function to transform the array content to be an array of objects containing the "value" from the element and a "count" for later reference as input to the "reduce" function. You need this to be consistent with how the reducer is going to work with this and is necessary to get the overall counts needed to get the average.

In the "reducer" itself you are basically summing the array contents for each position for both the "value" and the "count". This is important as the "reduce" function can be called multiple times in the overall reduction process, feeding it's output as "input" in a subsequent call. So that is why both mapper and reducer are working in this format.

With the final reduced results, the finalize function is called to simply look at each summed "value" and "count" and divide by the count to return an average.

Mileage may vary on whether modern aggregation pipeline processing or indeed this mapReduce process will perform the best, mostly depending on the data. Using $unwind in the prescribed way will certainly increase the amount of documents to be analyzed and thus produce overhead. On the contrary, while JavaScript processing as opposed to native operators in the aggregation framework will generally be slower, but the document processing overhead here is reduced since this is keeping arrays.

The advice I would give is use this if upgrading to 3.2 is not an option, yet if even an option then at least benchmark the two on your data and expected growth to see which works best for you.


Returns

{
        "results" : [
                {
                        "_id" : "host1",
                        "value" : {
                                "cpu" : [
                                        70,
                                        25,
                                        70
                                ]
                        }
                }
        ],
        "timeMillis" : 38,
        "counts" : {
                "input" : 2,
                "emit" : 2,
                "reduce" : 1,
                "output" : 1
        },
        "ok" : 1
}

这篇关于你如何找到蒙戈阵列聚集体尺寸大于二?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆