在持续时间参数(#行,秒,#Tweets等)之后停止Tweepy流 [英] Stopping Tweepy stream after a duration parameter (# lines, seconds, #Tweets, etc)
问题描述
我正在使用Tweepy根据#WorldCup标签捕获流推文,如下面的代码所示。
I am using Tweepy to capture streaming tweets based off of the hashtag #WorldCup, as seen by the code below. It works as expected.
class StdOutListener(StreamListener):
''' Handles data received from the stream. '''
def on_status(self, status):
# Prints the text of the tweet
print('Tweet text: ' + status.text)
# There are many options in the status object,
# hashtags can be very easily accessed.
for hashtag in status.entries['hashtags']:
print(hashtag['text'])
return true
def on_error(self, status_code):
print('Got an error with status code: ' + str(status_code))
return True # To continue listening
def on_timeout(self):
print('Timeout...')
return True # To continue listening
if __name__ == '__main__':
listener = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, listener)
stream.filter(follow=[38744894], track=['#WorldCup'])
因为这是当前的热门标签,搜索不会花太长时间就能捕捉到Tweepy允许您进行一次交易的最大数量的推文。但是,如果我要在#StackOverflow上进行搜索,它可能会慢很多,因此,我想一种杀死流的方法。我可以在几个参数上执行此操作,例如在100次鸣叫后停止,在3分钟后停止,文本输出文件达到150行后停止等等。我确实知道套接字超时时间并未用于实现此目的。
Because this is a hot hashtag right now, searches don't take too long to catch the maximum amount of tweets that Tweepy lets you get in one transaction. However, if I was going to search on #StackOverflow, it might be much slower, and therefore, I'd like a way to kill the stream. I could do this on several parameters, such as stopping after 100 tweets, stopping after 3 minutes, after a text output file has reached 150 lines, etc. I do know that the socket timeout time isn't used to achieve this.
我看了这个类似的问题:
I have taken a look at this similar question:
但是,它似乎不使用流API。它收集的数据也很混乱,而此文本输出却是干净的。
However, it appears to not use the streaming API. The data that it collects is also very messy, whereas this text output is clean.
谁能建议一种停止Tweepy的方法(当以这种方式使用流时),
Can anyone suggest a way to stop Tweepy (when using the stream in this method), based on some user input parameter, besides a keyboard interrupt?
谢谢
推荐答案
我解决了这个问题,所以我将成为回答自己问题的互联网英雄之一。
I solved this, so I'm going to be one of those internet heroes that answers their own question.
这是通过将静态Python变量用于计数器和停止值(例如,在抓取20条推文后停止)来实现的。当前这是一个地理位置搜索,但是您可以使用 getTweetsByHashtag()
方法轻松地将其替换为主题标签搜索。
This is achieved by using static Python variables for the counter and for the stop value (e.g. stop after you grab 20 tweets). This is currently a geolocation search, but you could easily swap it for a hashtag search by using the getTweetsByHashtag()
method.
#!/usr/bin/env python
from tweepy import (Stream, OAuthHandler)
from tweepy.streaming import StreamListener
class Listener(StreamListener):
tweet_counter = 0 # Static variable
def login(self):
CONSUMER_KEY =
CONSUMER_SECRET =
ACCESS_TOKEN =
ACCESS_TOKEN_SECRET =
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
return auth
def on_status(self, status):
Listener.tweet_counter += 1
print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"'
%(status.author.screen_name, status.text.replace('\n', ' ')))
if Listener.tweet_counter < Listener.stop_at:
return True
else:
print('Max num reached = ' + str(Listener.tweet_counter))
return False
def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish):
try:
Listener.stop_at = stop_at_number # Create static variable
auth = self.login()
streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value
streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish])
except KeyboardInterrupt:
print('Got keyboard interrupt')
def getTweetsByHashtag(self, stop_at_number, hashtag):
try:
Listener.stopAt = stop_at_number
auth = self.login()
streaming_api = Stream(auth, Listener(), timeout=60)
# Atlanta area.
streaming_api.filter(track=[hashtag])
except KeyboardInterrupt:
print('Got keyboard interrupt')
listener = Listener()
listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.
这篇关于在持续时间参数(#行,秒,#Tweets等)之后停止Tweepy流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!