从Twitter获得稳定的消息流 [英] Getting a steady flow of messages from twitter

查看:85
本文介绍了从Twitter获得稳定的消息流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想尝试创建一个简单的Twitter客户端,以了解我的口味并自动找到朋友和有趣的推文,以向我提供相关信息.

I'd like to try to make a simple twitter client that learns my tastes and automatically finds friends and interesting tweets to provide me with relevant information.

开始之前,我需要获得大量随机Twitter消息,以便可以在其上测试一些机器学习算法.

To get started, I would need to get a good stream of random twitter messages, so I can test a few machine learning algorithms on them.

我应该为此使用哪些API方法?我是否需要定期轮询以获取消息,或者有办法让Twitter在发布消息时推送消息?

What API methods should I use for this? Do I have to poll regularly to get messages, or is there a way to get twitter to push messages as they are published?

我也有兴趣学习任何类似的项目.

I'd also be interested in learning about any similar project.

推荐答案

我使用 tweepy 访问Twitter API并收听它们提供的公共流-应该是百分之一-所有推文的样本.这是我自己使用的示例代码.您仍然可以使用基本的身份验证机制进行流式传输,尽管它们可能很快会改变.相应地更改USERNAME和PASSWORD变量,并确保您遵守Twitter返回的错误代码(此示例代码在某些情况下可能不遵守Twitter希望的指数退避机制).

I use tweepy to access Twitter API and listen to the public stream they provide -- which should be a one-percent-sample of all tweets. Here is my sample code that I use myself. You can still use the basic auth mechanism for streaming, though they may change that soon. Change the USERNAME and PASSWORD variables accordingly and make sure you respect the error codes that Twitter returns (this sample code might not be respecting the exponential backoff mechanism that Twitter wants in some cases).

import tweepy
import time

def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_status(self, status):
      print status.text.encode('utf-8')

    def on_error(self, status_code):
      log_error("Status code: %s." % status_code)
      time.sleep(3)
      return True  # keep stream alive

    def on_timeout(self):
      log_error("Timeout.")


def main():
    auth = tweepy.BasicAuthHandler(USERNAME, PASSWORD)
    listener = StreamWatcherListener()
    stream = tweepy.Stream(auth, listener)
    stream.sample()

if __name__ == '__main__':
    try:
      main()
    except KeyboardInterrupt:
      break
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

我还设置了套接字模块的超时时间,我相信Python的默认超时行为存在一些问题,因此请小心.

I also set the timeout of the socket module, I believe I had some problems with the default timeout behavior in Python, so be careful.

import socket
socket.setdefaulttimeout(timeout)

这篇关于从Twitter获得稳定的消息流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆