一分钟后无法在 tweepy 中停止流式传输 [英] Unable to stop Streaming in tweepy after one minute

查看:24
本文介绍了一分钟后无法在 tweepy 中停止流式传输的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Stream.filter() 方法在 5 分钟的时间内流式传输 Twitter 数据.我将检索到的推文存储在 JSON 文件中.问题是我无法从程序中停止 filter() 方法.我需要手动停止执行.我尝试使用 time 包根据系统时间停止数据.我能够停止将推文写入 JSON 文件,但流方法仍在进行,但无法继续到下一行代码.我正在使用 IPython notebook 编写和执行代码.代码如下:

auth = OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_secret)api = tweepy.API(auth)从 tweepy 导入流从 tweepy.streaming 导入 StreamListener类 MyListener(StreamListener):def __init__(self, start_time, time_limit=60):self.time = start_timeself.limit = time_limitdef on_data(self, data):while (time.time() - self.time) <自我限制:尝试:saveFile = open('abcd.json', 'a')saveFile.write(数据)saveFile.write('\n')saveFile.close()返回真除了 BaseException 作为 e:打印'失败的数据,',str(e)时间.sleep(5)返回真def on_status(self, status):if (time.time() - self.time) >= self.limit:打印时间结束"返回假def on_error(self, status):if (time.time() - self.time) >= self.limit:打印时间结束"返回假别的:打印(状态)返回真start_time = time.time()流数据 = 流(身份验证,MyListener(开始时间,20))stream_data.filter(track=['name1','name2',...list ...,'name n'])#我要跟踪的字符串列表

这些链接很相似,但我没有直接回答我的问题

Tweepy:流数据 X 分钟?

在持续时间参数后停止 Tweepy steam(# 行、秒、#Tweets 等)

Tweepy Streaming - 停止收集 x 数量的推文

我使用此链接作为参考,http://stats.seandolinar.com/collecting-twitter-data-using-a-python-stream-listener/

解决方案

  1. 为了关闭流,您需要从 on_data()on_status() 返回 False.

  2. 因为 tweepy.Stream() 本身运行一个 while 循环,所以您不需要 on_data() 中的 while 循环.

  3. 初始化MyListener时,没有调用父类的__init__方法,所以没有正确初始化.

因此,对于您要执行的操作,代码应该类似于:

class MyStreamListener(tweepy.StreamListener):def __init__(self, time_limit=60):self.start_time = time.time()self.limit = time_limitself.saveFile = open('abcd.json', 'a')super(MyStreamListener, self).__init__()def on_data(self, data):if (time.time() - self.start_time) <自我限制:self.saveFile.write(data)self.saveFile.write('\n')返回真别的:self.saveFile.close()返回错误myStream = tweepy.Stream(auth=api.auth, listener=MyStreamListener(time_limit=20))myStream.filter(track=['test'])

I am trying to stream twitter data for a period of time of say 5 minutes, using the Stream.filter() method. I am storing the retrieved tweets in a JSON file. The problem is I am unable to stop the filter() method from within the program. I need to stop the execution manually. I tried stopping the data based on system time using the time package. I was able to stop writing tweets to the JSON file but the stream method is still going on, but It was not able to continue to the next line of code. I am using IPython notebook to write and execute the code. Here's the code:

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

from tweepy import Stream
from tweepy.streaming import StreamListener

class MyListener(StreamListener):

    def __init__(self, start_time, time_limit=60):
        self.time = start_time
        self.limit = time_limit

    def on_data(self, data):
        while (time.time() - self.time) < self.limit:
            try:
                saveFile = open('abcd.json', 'a')
                saveFile.write(data)
                saveFile.write('\n')
                saveFile.close()
                return True
            except BaseException as e:
                print 'failed ondata,', str(e)
                time.sleep(5)
        return True

    def on_status(self, status):
        if (time.time() - self.time) >= self.limit:
            print 'time is over'
            return false

    def on_error(self, status):
        if (time.time() - self.time) >= self.limit:
            print 'time is over'
            return false
        else:
            print(status)
            return True

start_time = time.time()
stream_data = Stream(auth, MyListener(start_time,20))
stream_data.filter(track=['name1','name2',...list ...,'name n'])#list of the strings I want to track

These links are similar but I does not answer my question directly

Tweepy: Stream data for X minutes?

Stopping Tweepy steam after a duration parameter (# lines, seconds, #Tweets, etc)

Tweepy Streaming - Stop collecting tweets at x amount

I used this link as my reference, http://stats.seandolinar.com/collecting-twitter-data-using-a-python-stream-listener/

解决方案

  1. In order to close the stream you need to return False from on_data(), or on_status().

  2. Because tweepy.Stream() runs a while loop itself, you don't need the while loop in on_data().

  3. When initializing MyListener, you didn't call the parent's class __init__ method, so it wasn't initialized properly.

So for what you're trying to do, the code should be something like:

class MyStreamListener(tweepy.StreamListener):
    def __init__(self, time_limit=60):
        self.start_time = time.time()
        self.limit = time_limit
        self.saveFile = open('abcd.json', 'a')
        super(MyStreamListener, self).__init__()

    def on_data(self, data):
        if (time.time() - self.start_time) < self.limit:
            self.saveFile.write(data)
            self.saveFile.write('\n')
            return True
        else:
            self.saveFile.close()
            return False

myStream = tweepy.Stream(auth=api.auth, listener=MyStreamListener(time_limit=20))
myStream.filter(track=['test'])

这篇关于一分钟后无法在 tweepy 中停止流式传输的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆