Tweepy:流数据 X 分钟? [英] Tweepy: Stream data for X minutes?

查看:23
本文介绍了Tweepy:流数据 X 分钟?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 tweepy 来为关键字的公开推文流数据挖掘.这非常简单,已在多个地方进行了描述:

I'm using tweepy to datamine the public stream of tweets for keywords. This is pretty straightforward and has been described in multiple places:

http://runnable.com/Us9rrMiTWf9bAAW3/how-to-stream-data-from-twitter-with-tweepy-for-python

http://adilmoujahid.com/posts/2014/07/twitter-分析/

直接从第二个链接复制代码:

Copying code directly from the second link:

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
access_token = "ENTER YOUR ACCESS TOKEN"
access_token_secret = "ENTER YOUR ACCESS TOKEN SECRET"
consumer_key = "ENTER YOUR API KEY"
consumer_secret = "ENTER YOUR API SECRET"


#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
    stream.filter(track=['python', 'javascript', 'ruby'])

我想不通的是如何将这些数据流式传输到 python 变量中? 而不是将其打印到屏幕上......我在 ipython 笔记本上工作,想要在流式传输一分钟左右后,在某个变量 foo 中捕获流.此外,如何让流超时?它以这种方式无限期地运行.

What I can't figure out is how can I stream this data into a python variable? Instead of printing it to the screen... I'm working in an ipython notebook and want to capture the stream in some variable, foo after streaming for a minute or so. Furthermore, how do I get the stream to timeout? It runs indefinitely in this manner.

使用 tweepy 访问 Twitter 的 Streaming API

推荐答案

是的,在帖子中,@Adil Moujahid 提到他的代码运行了 3 天.我改编了相同的代码并进行了初始测试,做了以下调整:

Yes, in the post, @Adil Moujahid mentions that his code ran for 3 days. I adapted the same code and for initial testing, did the following tweaks:

a) 添加了位置过滤器以获取有限的推文,而不是包含关键字的通用推文.请参阅如何向 tweepy 模块添加位置过滤器.从这里开始,您可以在上面的代码中创建一个中间变量,如下所示:

a) Added a location filter to get limited tweets instead of universal tweets containing the keyword. See How to add a location filter to tweepy module. From here, you can create an intermediate variable in the above code as follows:

stream_all = Stream(auth, l)

假设我们选择旧金山地区,我们可以添加:

Suppose we, select San Francisco area, we can add:

stream_SFO = stream_all.filter(locations=[-122.75,36.8,-121.75,37.8])  

假设过滤位置的时间少于过滤关键字的时间.

It is assumed that the time to filter for location is lesser than filter for the keywords.

(b) 然后你可以过滤关键字:

(b) Then you can filter for the keywords:

tweet_iter = stream_SFO.filter(track=['python', 'javascript', 'ruby']) 

(c) 然后您可以将其写入文件,如下所示:

(c) You can then write it to file as follows:

with open('file_name.json', 'w') as f:
        json.dump(tweet_iter,f,indent=1)

这应该花费更少的时间.我恰巧想解决您今天发布的同一问题.因此,我没有执行时间.

This should take much lesser time. I co-incidently wanted to address the same question that you have posted today. Hence, I don't have the execution time.

希望这会有所帮助.

这篇关于Tweepy:流数据 X 分钟?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆