有效计算MongoDB中的发生百分比 [英] Efficiently count percentage of occurrence in MongoDB

查看:50
本文介绍了有效计算MongoDB中的发生百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我正在修改MongoDB,并尝试获取 count()聚合查询以正确缩放,以允许我轻松地计算出其中某些值的出现百分比整个馆藏中的文件.

So, I'm tinkering with MongoDB, and I'm trying to get the count() aggregation querying to scale properly, to allow me to easily calculate the percentage of occurrence of certain values in the document across the collection.

我有一个结构如下的文档:

I have a document with a structure like:

{
    foo : 'bar',
    moo : 'cow',
    values : {
        alpha : true,
        beta : false,
        gamma : false,
        delta : true ... (many more)
    }
}

现在,我有数千个此类文档,并且我想有效地计算 values 对象中所有值的true(或false)百分比(在我的情况下),大约有50个).即alpha正确,beta正确的时间百分比等等.

Now, I have several thousand of these documents, and I want to efficiently calculate the percentage of true (or the percentage of false) of all the values in the values object (and in my case, there are ~50). ie, what percentage of the time alpha is true, beta is true, etc.

我从 count()开始幼稚,但似乎一次只允许一个查询,因此导致我执行了此操作(使用PHP Mongo类,但基本上它只是一个常规 count()函数:

I started naively with count(), but it seems like it only allows one query at a time, so that led me to do this (using the PHP Mongo class, but its basically just a regular count() function:

 $array_of_keys = array('alpha', 'beta', 'gamma', 'delta'...);
 for($i=0;$i<count($array_of_keys);$i++){
    $array_of_keys = [...]
    for($i=0;$i<count($array_of_keys);$i++){

$false  = intval($collection->count(array($array_of_keys[$i]=>false)));
$true  = intval($collection->count(array($array_of_keys[$i]=>true)));
}

但是即使记录数量很少(大约100条),这也花费了9秒.

But even with a very small number of records (around 100), this took 9 seconds.

什么是最好的方法?

推荐答案

这是一个简单的 MapReduce 即可满足您的需求:

Here is a simple MapReduce that will do what you want:

map = function() {
    for (var key in this.values){
        emit(key, {count:1, trues: (this.values[key] ? 1 : 0)});
    }
}

reduce = function(key, values){
    var out = values[0];
    for (var i=1; i < values.length; i++){
        out.count += values[i].count;
        out.trues += values[i].trues;
    }
    return out;
}

finalize = function(key, value){
    value.ratio = value.trues / value.count;
    return value;
}

db.runCommand({mapReduce:'collection',
               map:map,
               reduce:reduce,
               finalize:finalize,
               out:'counts'
               })

db.counts.findOne({_id:'alpha'})
{_id: 'alpha', value: {count: 100, trues: 52, ratio: 0.52}}

当您插入主收藏集时,您也可以像这样更新将为您提供实时数据视图:

You could also do an upsert like this when you insert into your main collection which will give you a real-time view into your data:

for (var key in this.values){
    db.counts.update({_id:key},
                     {$inc:{count:1, trues: (this.values[key] ? 1 : 0)}},
                     true);
}

实际上,您甚至可以结合使用这些方法.进行一次MapReduce批处理作业以填充计数集合,然后使用upserts使其保持最新状态.

In fact, you could even combine these methods. Do a one-time MapReduce batch job to populate the counts collection and then use upserts to keep it up to date.

这篇关于有效计算MongoDB中的发生百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆