使用 tweepy 按日期获取推文 [英] Getting tweets by date with tweepy

查看:53
本文介绍了使用 tweepy 按日期获取推文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 USATODAY 中提取了允许的最大推文数量,即 3000.

I pulled the max amount of tweets allowed from USATODAY which was 3000.

现在我想创建一个脚本来在每天晚上 11:59 自动拉取 USATODAY 的推文.

Now I want to create a script to automatically pull USATODAY's tweets at 11:59PM of every day.

我打算使用流 api,但后来我不得不让它运行一整天.

I was going to use the stream api but then I will have to keep it running the whole day.

我能否深入了解如何创建一个脚本,该脚本在每晚 11:59 运行 REST API 以提取当天的推文?如果没有,有人知道如何根据日期拉推文吗?

Can I get some insight on how to create a script where it runs the REST API every night at 11:59PM to pull the day's tweets? If not does anyone know how to pull tweets based on date?

我想在我的 for 循环中放置一个 ifelse 语句,但这似乎效率低下,因为它每晚必须搜索 3000 条推文.

I was thinking about placing an ifelse statement in my for loop but that seems inefficient, because it will have to search through 3000 tweets every night.

这就是我现在所拥有的:

This is what I have now:

client = MongoClient('localhost', 27017)
db = client['twitter_db']
collection = db['usa_collection']
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)

api = tweepy.API(auth)

for tweet in tweepy.Cursor(api.user_timeline,id='USATODAY').items():
    collection.insert(tweet._json)

推荐答案

您可以在页面的帮助下简单地检索推文,现在在收到的每个页面上迭代推文并提取被访问的推文的创建时间使用 tweet.created_at 并找到提取日期和当前日期之间的差异,如果差异小于 1 天,那么这是一条有利的推文,否则您只需退出循环.

You can simply retrieve the tweets with the help of pages, Now on each page received you iterate over the tweets and extract the creation time of that tweet which is accessed using tweet.created_at and the you find the difference between the extracted date and the current date, if the difference is less than 1 day then it is a favourable tweet else you just exit out of the loop.

import tweepy, datetime, time

def get_tweets(api, username):
    page = 1
    deadend = False
    while True:
        tweets = api.user_timeline(username, page = page)

        for tweet in tweets:
            if (datetime.datetime.now() - tweet.created_at).days < 1:
                #Do processing here:

                print tweet.text.encode("utf-8")
            else:
                deadend = True
                return
        if not deadend:
            page+=1
            time.sleep(500)

get_tweets(api, "anmoluppal366")

注意:您不会访问该人的全部 3000 条推文,您只会遍历那些在启动应用程序时的 24 小时内创建的推文.

Note: you are not accessing all 3000 tweets of that person, you only iterate over those tweets which were created within the span of 24 hours at the time of launching your application.

这篇关于使用 tweepy 按日期获取推文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆