获取文档的查找数组计数 [英] Get looked up array count for a document
问题描述
我有 2 个集合:单词和短语每个单词文档都有一组短语 id.每个短语都可以是活动的或非活动的.
i have 2 collections : words and phrases Each word document has an array of phrases id's. And each phrase can be active or inactive.
例如:
词 :
{"word" => "hello", 短语 => [1,2]}
{"word" => "table", 短语 => [2]}
短语 :
{"id" => 1, "phrase" => "hello world!", "active" => 1}
{"id" => 2, "phrase" => "hello,我已经买了新表", "active" => 0}
For example :
words :
{"word" => "hello", phrases => [1,2]}
{"word" => "table", phrases => [2]}
phrases :
{"id" => 1, "phrase" => "hello world!", "active" => 1}
{"id" => 2, "phrase" => "hello, i have already bought new table", "active" => 0}
我需要计算每个单词的活跃短语数.
I need to get count of active phrases for each word.
在 php 中,我是这样做的:
1. 获取所有单词
2. 对于每个单词,获取条件为 ['active' => 1]
的活跃短语数问题: 我怎样才能在一个请求中获得包含活跃短语的单词?我尝试使用 MapReduce,但我需要为每个单词发出请求以获取活动短语的数量.
更新:在我的测试集中有 92 000 个短语和 23 000 个单词.
In php i do it like this:
1. get all words
2. for each word get count of active phrases with condition ['active' => 1]
Question: How can i get words with active phrases count in one request? I tried to use MapReduce, but i need to make a request for each word to get count of active phrases.
UPD:
In my test collection there are 92 000 phrases and 23 000 words.
我已经测试了这两种变体:对每个单词使用 php 循环,我在 mongo 中获得短语计数和聚合函数.
I have already tested both variant: with php loop for each word in which i get phrases count and aggreagation function in mongo.
但是由于phrases_data,我在下面的commets中更改了聚合管道.它是数组,所以我不能在它上面使用 $match.我在 $lookup 之后使用 $unwind.
But i changed aggregation pipeline in commets below because of phrases_data. It is array, so i can't use $match on it. I use $unwind after $lookup.
[ '$unwind' => '$5'],
[
'$lookup' => [
'from' => 'phrases_926ee3bc9fa72b029e028ec90e282072ea0721d1',
'localField' => '5',
'foreignField' => '0',
'as' => 'phrases_data'
]
],
[ '$unwind' => '$phrases_data'],
[ '$match' => [ 'phrases_data.3' => 77] ], //phrases_data.3 => 77 it is similar to phrases_data.active => 1
[ '$group' =>
[
'_id' => ['word' => '$1', 'id' => '$0'],
'active_count' => [ '$sum' => 1]
]
],
[ '$match' => [ 'active_count' => ['$gt' => 0]] ],
[ '$sort' =>
[
'active_count' => -1
]
]
问题是 $group 命令占用了 80% 的处理时间.而且它比php循环慢得多.这是我的测试收集结果:
The problem is that $group command take 80% of process time. And it is much slower than php loop. Here is my results for test collection:
1. Php loop (get words-> get phrases count for each word): 10 seconds
2. Aggregation function : 20 seconds
推荐答案
db.words.aggregate([
{ "$unwind" : "$phrases"},
{
"$lookup": {
"from": "phrases",
"localField": "phrases",
"foreignField": "id",
"as": "phrases_data"
}
},
{ "$match" : { "phrases_data.active" : 1} },
{ "$group" : {
"_id" : "$word",
"active_count" : { $sum : 1 }
}
}
]);
您可以使用上面的聚合管道:
You can use above aggregation pipeline :
- 从单词集合文档中展开短语数组作为单独的文档
- 使用展开的短语在短语集合中进行查找(连接)
- 过滤短语并使用 $match 检查是否有效
- 最后使用 $sum : 1 按单词和计数对短语进行分组
这篇关于获取文档的查找数组计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!