一分钟后无法在 tweepy 中停止流式传输 [英] Unable to stop Streaming in tweepy after one minute
问题描述
我正在尝试使用 Stream.filter() 方法在 5 分钟的时间内流式传输 Twitter 数据.我将检索到的推文存储在 JSON 文件中.问题是我无法从程序中停止 filter() 方法.我需要手动停止执行.我尝试使用 time 包根据系统时间停止数据.我能够停止将推文写入 JSON 文件,但流方法仍在进行,但无法继续到下一行代码.我正在使用 IPython notebook 编写和执行代码.代码如下:
auth = OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_secret)api = tweepy.API(auth)从 tweepy 导入流从 tweepy.streaming 导入 StreamListener类 MyListener(StreamListener):def __init__(self, start_time, time_limit=60):self.time = start_timeself.limit = time_limitdef on_data(self, data):while (time.time() - self.time) <自我限制:尝试:saveFile = open('abcd.json', 'a')saveFile.write(数据)saveFile.write('\n')saveFile.close()返回真除了 BaseException 作为 e:打印'失败的数据,',str(e)时间.sleep(5)返回真def on_status(self, status):if (time.time() - self.time) >= self.limit:打印时间结束"返回假def on_error(self, status):if (time.time() - self.time) >= self.limit:打印时间结束"返回假别的:打印(状态)返回真start_time = time.time()流数据 = 流(身份验证,MyListener(开始时间,20))stream_data.filter(track=['name1','name2',...list ...,'name n'])#我要跟踪的字符串列表
这些链接很相似,但我没有直接回答我的问题
在持续时间参数后停止 Tweepy steam(# 行、秒、#Tweets 等)
Tweepy Streaming - 停止收集 x 数量的推文
我使用此链接作为参考,http://stats.seandolinar.com/collecting-twitter-data-using-a-python-stream-listener/
为了关闭流,您需要从
on_data()
或on_status()
返回False
.>因为
tweepy.Stream()
本身运行一个 while 循环,所以您不需要on_data()
中的 while 循环.初始化
MyListener
时,没有调用父类的__init__
方法,所以没有正确初始化.
因此,对于您要执行的操作,代码应该类似于:
class MyStreamListener(tweepy.StreamListener):def __init__(self, time_limit=60):self.start_time = time.time()self.limit = time_limitself.saveFile = open('abcd.json', 'a')super(MyStreamListener, self).__init__()def on_data(self, data):if (time.time() - self.start_time) <自我限制:self.saveFile.write(data)self.saveFile.write('\n')返回真别的:self.saveFile.close()返回错误myStream = tweepy.Stream(auth=api.auth, listener=MyStreamListener(time_limit=20))myStream.filter(track=['test'])
I am trying to stream twitter data for a period of time of say 5 minutes, using the Stream.filter() method. I am storing the retrieved tweets in a JSON file. The problem is I am unable to stop the filter() method from within the program. I need to stop the execution manually. I tried stopping the data based on system time using the time package. I was able to stop writing tweets to the JSON file but the stream method is still going on, but It was not able to continue to the next line of code. I am using IPython notebook to write and execute the code. Here's the code:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
from tweepy import Stream
from tweepy.streaming import StreamListener
class MyListener(StreamListener):
def __init__(self, start_time, time_limit=60):
self.time = start_time
self.limit = time_limit
def on_data(self, data):
while (time.time() - self.time) < self.limit:
try:
saveFile = open('abcd.json', 'a')
saveFile.write(data)
saveFile.write('\n')
saveFile.close()
return True
except BaseException as e:
print 'failed ondata,', str(e)
time.sleep(5)
return True
def on_status(self, status):
if (time.time() - self.time) >= self.limit:
print 'time is over'
return false
def on_error(self, status):
if (time.time() - self.time) >= self.limit:
print 'time is over'
return false
else:
print(status)
return True
start_time = time.time()
stream_data = Stream(auth, MyListener(start_time,20))
stream_data.filter(track=['name1','name2',...list ...,'name n'])#list of the strings I want to track
These links are similar but I does not answer my question directly
Tweepy: Stream data for X minutes?
Stopping Tweepy steam after a duration parameter (# lines, seconds, #Tweets, etc)
Tweepy Streaming - Stop collecting tweets at x amount
I used this link as my reference, http://stats.seandolinar.com/collecting-twitter-data-using-a-python-stream-listener/
In order to close the stream you need to return
False
fromon_data()
, oron_status()
.Because
tweepy.Stream()
runs a while loop itself, you don't need the while loop inon_data()
.When initializing
MyListener
, you didn't call the parent's class__init__
method, so it wasn't initialized properly.
So for what you're trying to do, the code should be something like:
class MyStreamListener(tweepy.StreamListener):
def __init__(self, time_limit=60):
self.start_time = time.time()
self.limit = time_limit
self.saveFile = open('abcd.json', 'a')
super(MyStreamListener, self).__init__()
def on_data(self, data):
if (time.time() - self.start_time) < self.limit:
self.saveFile.write(data)
self.saveFile.write('\n')
return True
else:
self.saveFile.close()
return False
myStream = tweepy.Stream(auth=api.auth, listener=MyStreamListener(time_limit=20))
myStream.filter(track=['test'])
这篇关于一分钟后无法在 tweepy 中停止流式传输的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!