将 $push 与 $group 与 pymongo 一起使用 [英] Using $push with $group with pymongo
问题描述
修复我的 make_pipeline()
函数,使用聚合查询计算每个用户的推文数量,将它们添加到数组中并返回推文最多的 5 个用户.
Fix my make_pipeline()
function to, using an aggregation query, count the number of tweets for each user, add them to an array and return the 5 users with the most tweets.
使用聚合查询,计算每个用户的推文数量.在同一个 $group
阶段,使用 $push
来累积每个用户的所有推文文本.
Using an aggregation query, count the number of tweets for each user.
In the same $group
stage, use $push
to accumulate all the tweet texts for each user.
将您的输出限制为推文最多的 5 个用户.
Limit your output to the 5 users with the most tweets.
您的结果文档应仅包含字段:
Your result documents should include only the fields:
"_id"
(用户的网名),"count"
(为用户找到的推文数量),"tweet_texts"
(为用户找到的推文文本列表).
"_id"
(screen name of user),"count"
(number of tweets found for the user),"tweet_texts"
(a list of the tweet texts found for the user).
为了实现之前的目标,我正在测试以下代码:
To achieve the previous objective I am testing the following code:
def make_pipeline():
# complete the aggregation pipeline
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
return pipeline
逻辑
首先我按 username
将所有推文分组.然后,在同一阶段,我将所有发短信的推文推送到 tweet_texts
并计算每个分组出现的次数.我相信这会给我提供最多推文的用户数量.
Logic
First I group all the tweets by username
. Then, in the same stage, I push all the texted tweets to tweet_texts
and I count each occurrence that was grouped. I believe this will give me the number of users with most tweets.
然后我进行投影以仅选择我想要的三个字段:
Then I make a projection to select only the three fields I want:
- _id
- 计数
- tweet_texts
我通过排序和限制结果数量来结束.
I finish by sorting and limiting the amount of results.
我通过了测试,但没有通过提交.我究竟做错了什么?我现在的错误肯定是在第一(小组)阶段,但我找不到上帝的爱我做错了什么.
I am passing the test, but not the submission. What am I doing wrong? I now the error must be in the first (group) stage, but I can't find for the love of God what I am doing wrong.
{
"_id" : ObjectId("5304e2e3cc9e684aa98bef97"),
"text" : "First week of school is over :P",
"in_reply_to_status_id" : null,
"retweet_count" : null,
"contributors" : null,
"created_at" : "Thu Sep 02 18:11:25 +0000 2010",
"geo" : null,
"source" : "web",
"coordinates" : null,
"in_reply_to_screen_name" : null,
"truncated" : false,
"entities" : {
"user_mentions" : [ ],
"urls" : [ ],
"hashtags" : [ ]
},
"retweeted" : false,
"place" : null,
"user" : {
"friends_count" : 145,
"profile_sidebar_fill_color" : "E5507E",
"location" : "Ireland :)",
"verified" : false,
"follow_request_sent" : null,
"favourites_count" : 1,
"profile_sidebar_border_color" : "CC3366",
"profile_image_url" : "http://a1.twimg.com/profile_images/1107778717/phpkHoxzmAM_normal.jpg",
"geo_enabled" : false,
"created_at" : "Sun May 03 19:51:04 +0000 2009",
"description" : "",
"time_zone" : null,
"url" : null,
"screen_name" : "Catherinemull",
"notifications" : null,
"profile_background_color" : "FF6699",
"listed_count" : 77,
"lang" : "en",
"profile_background_image_url" : "http://a3.twimg.com/profile_background_images/138228501/149174881-8cd806890274b828ed56598091c84e71_4c6fd4d8-full.jpg",
"statuses_count" : 2475,
"following" : null,
"profile_text_color" : "362720",
"protected" : false,
"show_all_inline_media" : false,
"profile_background_tile" : true,
"name" : "Catherine Mullane",
"contributors_enabled" : false,
"profile_link_color" : "B40B43",
"followers_count" : 169,
"id" : 37486277,
"profile_use_background_image" : true,
"utc_offset" : null
},
"favorited" : false,
"in_reply_to_user_id" : null,
"id" : NumberLong("22819398300")
}
请帮忙!
推荐答案
阅读评论
看了评论我发现
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
实际上应该改为:
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
为什么?
完整的答案和解释可以在答案中看到:
Why?
The full answer and explanation can be seen in the answer:
故事的结论是我错误地使用了 $project
阶段.不仅首先不需要,为了使其具有幂等性,它应该是
The conclusion of the story is that I am using the $project
stage wrongly. Not only was is no needed in the first place, to make it idempotent it should be
{"$project": {"_id": "$_id", "count": 1, "tweet_texts": 1}},
我也强烈推荐他的回答:
I also highly recommend his answer:
以下用户值得称赞++:
The following users deserve kudos++:
为了指引我走上正确的道路!
For directing me in to the right path!
这篇关于将 $push 与 $group 与 pymongo 一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!