获取文档的查找数组计数 [英] Get looked up array count for a document

查看:43
本文介绍了获取文档的查找数组计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 2 个集合:单词和短语每个单词文档都有一组短语 id.每个短语都可以是活动的或非活动的.

i have 2 collections : words and phrases Each word document has an array of phrases id's. And each phrase can be active or inactive.

例如:

词 :
{"word" => "hello", 短语 => [1,2]}
{"word" => "table", 短语 => [2]}

短语 :
{"id" => 1, "phrase" => "hello world!", "active" => 1}
{"id" => 2, "phrase" => "hello,我已经买了新表", "active" => 0}

For example :

words :
{"word" => "hello", phrases => [1,2]}
{"word" => "table", phrases => [2]}

phrases :
{"id" => 1, "phrase" => "hello world!", "active" => 1}
{"id" => 2, "phrase" => "hello, i have already bought new table", "active" => 0}

我需要计算每个单词的活跃短语数.

I need to get count of active phrases for each word.

在 php 中,我是这样做的:
1. 获取所有单词
2. 对于每个单词,获取条件为 ['active' => 1]

的活跃短语数问题: 我怎样才能在一个请求中获得包含活跃短语的单词?我尝试使用 MapReduce,但我需要为每个单词发出请求以获取活动短语的数量.

更新:在我的测试集中有 92 000 个短语和 23 000 个单词.

In php i do it like this:
1. get all words
2. for each word get count of active phrases with condition ['active' => 1]

Question: How can i get words with active phrases count in one request? I tried to use MapReduce, but i need to make a request for each word to get count of active phrases.

UPD: In my test collection there are 92 000 phrases and 23 000 words.

我已经测试了这两种变体:对每个单词使用 php 循环,我在 mongo 中获得短语计数和聚合函数.

I have already tested both variant: with php loop for each word in which i get phrases count and aggreagation function in mongo.

但是由于phrases_data,我在下面的commets中更改了聚合管道.它是数组,所以我不能在它上面使用 $match.我在 $lookup 之后使用 $unwind.

But i changed aggregation pipeline in commets below because of phrases_data. It is array, so i can't use $match on it. I use $unwind after $lookup.

[ '$unwind'  =>  '$5'],
    [
        '$lookup' =>  [
        'from' =>  'phrases_926ee3bc9fa72b029e028ec90e282072ea0721d1',
            'localField' =>  '5',
            'foreignField' =>  '0',
            'as' =>  'phrases_data'
        ]
    ],
    [ '$unwind'  =>  '$phrases_data'],
    [ '$match'  =>  [ 'phrases_data.3'  =>  77] ], //phrases_data.3 => 77 it is similar to phrases_data.active => 1
    [ '$group'  =>  
        [
            '_id'  =>  ['word'  =>  '$1', 'id'  =>  '$0'],
            'active_count'  =>  [ '$sum'  =>  1]
        ]
    ],
    [ '$match'  =>  [ 'active_count'  =>  ['$gt' => 0]] ],
    [ '$sort'  =>
        [
            'active_count'  => -1
        ]
    ]


问题是 $group 命令占用了 80% 的处理时间.而且它比php循环慢得多.这是我的测试收集结果:


The problem is that $group command take 80% of process time. And it is much slower than php loop. Here is my results for test collection:

1. Php loop (get words-> get phrases count for each word): 10 seconds
2. Aggregation function : 20 seconds

推荐答案

db.words.aggregate([
    { "$unwind" : "$phrases"},
    {
        "$lookup": {
            "from": "phrases",
            "localField": "phrases",
            "foreignField": "id",
            "as": "phrases_data"
        }
    },
    { "$match" : { "phrases_data.active" : 1} },
    { "$group" : {
        "_id" : "$word",
        "active_count" : { $sum : 1 }
        }
    }
]);

您可以使用上面的聚合管道:

You can use above aggregation pipeline :

  1. 从单词集合文档中展开短语数组作为单独的文档
  2. 使用展开的短语在短语集合中进行查找(连接)
  3. 过滤短语并使用 $match 检查是否有效
  4. 最后使用 $sum : 1 按单词和计数对短语进行分组

这篇关于获取文档的查找数组计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆