将 $push 与 $group 与 pymongo 一起使用 [英] Using $push with $group with pymongo

查看:76
本文介绍了将 $push 与 $group 与 pymongo 一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

修复我的 make_pipeline() 函数,使用聚合查询计算每个用户的推文数量,将它们添加到数组中并返回推文最多的 5 个用户.

Fix my make_pipeline() function to, using an aggregation query, count the number of tweets for each user, add them to an array and return the 5 users with the most tweets.

使用聚合查询,计算每个用户的推文数量.在同一个 $group 阶段,使用 $push 来累积每个用户的所有推文文本.

Using an aggregation query, count the number of tweets for each user. In the same $group stage, use $push to accumulate all the tweet texts for each user.

将您的输出限制为推文最多的 5 个用户.

Limit your output to the 5 users with the most tweets.

您的结果文档应仅包含字段:

Your result documents should include only the fields:

  • "_id" (用户的网名),
  • "count"(为用户找到的推文数量),
  • "tweet_texts"(为用户找到的推文文本列表).
  • "_id" (screen name of user),
  • "count" (number of tweets found for the user),
  • "tweet_texts" (a list of the tweet texts found for the user).

为了实现之前的目标,我正在测试以下代码:

To achieve the previous objective I am testing the following code:

def make_pipeline():
    # complete the aggregation pipeline
    pipeline = [
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
        {"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
        {"$sort" : {"count" : -1}},
        {"$limit": 5}
    ]
    return pipeline

逻辑

首先我按 username 将所有推文分组.然后,在同一阶段,我将所有发短信的推文推送到 tweet_texts 并计算每个分组出现的次数.我相信这会给我提供最多推文的用户数量.

Logic

First I group all the tweets by username. Then, in the same stage, I push all the texted tweets to tweet_texts and I count each occurrence that was grouped. I believe this will give me the number of users with most tweets.

然后我进行投影以仅选择我想要的三个字段:

Then I make a projection to select only the three fields I want:

  • _id
  • 计数
  • tweet_texts

我通过排序和限制结果数量来结束.

I finish by sorting and limiting the amount of results.

我通过了测试,但没有通过提交.我究竟做错了什么?我现在的错误肯定是在第一(小组)阶段,但我找不到上帝的爱我做错了什么.

I am passing the test, but not the submission. What am I doing wrong? I now the error must be in the first (group) stage, but I can't find for the love of God what I am doing wrong.

{
    "_id" : ObjectId("5304e2e3cc9e684aa98bef97"),
    "text" : "First week of school is over :P",
    "in_reply_to_status_id" : null,
    "retweet_count" : null,
    "contributors" : null,
    "created_at" : "Thu Sep 02 18:11:25 +0000 2010",
    "geo" : null,
    "source" : "web",
    "coordinates" : null,
    "in_reply_to_screen_name" : null,
    "truncated" : false,
    "entities" : {
        "user_mentions" : [ ],
        "urls" : [ ],
        "hashtags" : [ ]
    },
    "retweeted" : false,
    "place" : null,
    "user" : {
        "friends_count" : 145,
        "profile_sidebar_fill_color" : "E5507E",
        "location" : "Ireland :)",
        "verified" : false,
        "follow_request_sent" : null,
        "favourites_count" : 1,
        "profile_sidebar_border_color" : "CC3366",
        "profile_image_url" : "http://a1.twimg.com/profile_images/1107778717/phpkHoxzmAM_normal.jpg",
        "geo_enabled" : false,
        "created_at" : "Sun May 03 19:51:04 +0000 2009",
        "description" : "",
        "time_zone" : null,
        "url" : null,
        "screen_name" : "Catherinemull",
        "notifications" : null,
        "profile_background_color" : "FF6699",
        "listed_count" : 77,
        "lang" : "en",
        "profile_background_image_url" : "http://a3.twimg.com/profile_background_images/138228501/149174881-8cd806890274b828ed56598091c84e71_4c6fd4d8-full.jpg",
        "statuses_count" : 2475,
        "following" : null,
        "profile_text_color" : "362720",
        "protected" : false,
        "show_all_inline_media" : false,
        "profile_background_tile" : true,
        "name" : "Catherine Mullane",
        "contributors_enabled" : false,
        "profile_link_color" : "B40B43",
        "followers_count" : 169,
        "id" : 37486277,
        "profile_use_background_image" : true,
        "utc_offset" : null
    },
    "favorited" : false,
    "in_reply_to_user_id" : null,
    "id" : NumberLong("22819398300")
}

请帮忙!

推荐答案

阅读评论

看了评论我发现

pipeline = [
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
        {"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
        {"$sort" : {"count" : -1}},
        {"$limit": 5}
    ]

实际上应该改为:

pipeline = [ 
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}}, 
        {"$sort" : {"count" : -1}}, 
        {"$limit": 5}
    ]

为什么?

完整的答案和解释可以在答案中看到:

Why?

The full answer and explanation can be seen in the answer:

故事的结论是我错误地使用了 $project 阶段.不仅首先不需要,为了使其具有幂等性,它应该是

The conclusion of the story is that I am using the $project stage wrongly. Not only was is no needed in the first place, to make it idempotent it should be

{"$project": {"_id": "$_id", "count": 1, "tweet_texts": 1}},

我也强烈推荐他的回答:

I also highly recommend his answer:

以下用户值得称赞++:

The following users deserve kudos++:

为了指引我走上正确的道路!

For directing me in to the right path!

这篇关于将 $push 与 $group 与 pymongo 一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆