使用 tweepy 流式传输 api 只返回倒数第二条推文,而不是最后一条推文 [英] streaming api with tweepy only returns second last tweet and NOT the immediately last tweet

查看:32
本文介绍了使用 tweepy 流式传输 api 只返回倒数第二条推文,而不是最后一条推文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不仅是 Python 新手,而且是编程新手,因此非常感谢您的帮助!

I am new to not only python, but programming altogether so I'd appreciate your help very much!

我正在尝试使用 Tweepy 过滤检测来自 twitter 流 API 的所有推文.

I am trying to filter detect all tweets from the twitter streaming API using Tweepy.

我已按用户 ID 过滤并确认正在实时收集推文.

I have filtered by user id and have confirmed that tweets are being collected in real-time.

然而,似乎只有倒数第二条推文被实时收集,而不是最新的推文.

HOWEVER, it seems that only the second last tweet is being collected in real-time as opposed to the very latest tweet.

你们能帮忙吗?

import tweepy
import webbrowser
import time
import sys

consumer_key = 'xyz'
consumer_secret = 'zyx'


## Getting access key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth_url = auth.get_authorization_url()
print 'From your browser, please click AUTHORIZE APP and then copy the unique PIN: ' 
webbrowser.open(auth_url)
verifier = raw_input('PIN: ').strip()
auth.get_access_token(verifier)
access_key = auth.access_token.key
access_secret = auth.access_token.secret


## Authorizing account privileges
auth.set_access_token(access_key, access_secret)


## Get the local time
localtime = time.asctime( time.localtime(time.time()) )


## Status changes
api = tweepy.API(auth)
api.update_status('It worked - Current time is %s' % localtime)
print 'It worked - now go check your status!'


## Filtering the firehose
user = []
print 'Follow tweets from which user ID?'
handle = raw_input(">")
user.append(handle)

keywords = []
print 'What keywords do you want to track? Separate with commas.'
key = raw_input(">")
keywords.append(key)

class CustomStreamListener(tweepy.StreamListener):

    def on_status(self, status):

        # We'll simply print some values in a tab-delimited format
        # suitable for capturing to a flat file but you could opt 
        # store them elsewhere, retweet select statuses, etc.



        try:
            print "%s\t%s\t%s\t%s" % (status.text, 
                                      status.author.screen_name, 
                                      status.created_at, 
                                      status.source,)
        except Exception, e:
            print >> sys.stderr, 'Encountered Exception:', e
            pass

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

# Create a streaming API and set a timeout value of ??? seconds.

streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=None)

# Optionally filter the statuses you want to track by providing a list
# of users to "follow".

print >> sys.stderr, "Filtering public timeline for %s" % keywords

streaming_api.filter(follow=handle, track=keywords)

推荐答案

我遇到了同样的问题.答案并不像在我的情况下运行 python unbuffered 那样简单,我认为它也没有解决原始海报的问题.问题实际上出在名为 streaming.py 和函数 _read_loop() 的文件中的 tweepy 包的代码中,我认为需要对其进行更新以反映 twitter 从其流式 api 输出数据的格式的更改.

I had this same problem. The answer was not as easy as running python unbuffered in my case, and I presume it didn't solve the original poster's problem as well. The problem is actually in the code for the tweepy package in a file called streaming.py and function _read_loop() which I think needs to be updated to reflect changes to the format that twitter outputs data from their streaming api.

我的解决方案是从 github 下载最新的 tweepy 代码,https://github.com/tweepy/tweepy 特别是streaming.py 文件.您可以在此文件的提交历史记录中查看最近所做的更改以尝试解决此问题.

The solution for me was to download the newest code for tweepy from github, https://github.com/tweepy/tweepy specifically the streaming.py file. You can view the changes being made recently to try to resolve this issue in the commit history for this file.

我查看了 tweepy 类的详细信息,发现 streaming.py 类在 json tweet 流中读取的方式存在问题.我认为这与 twitter 更新他们的流媒体 API 以包含传入状态的位数有关.长话短说,这是我在 streaming.py 中替换的函数来解决这个问题.

I looked into the details of the tweepy class, and there was an issue with the way the streaming.py class reads in the json tweet stream. I think it has to do with twitter updating their streaming api to include the number of bits of an incoming status. Long story short, here was the function I replaced in streaming.py to resolve this question.

def _read_loop(self, resp):

    while self.running and not resp.isclosed():

        # Note: keep-alive newlines might be inserted before each length value.
        # read until we get a digit...
        c = '\n'
        while c == '\n' and self.running and not resp.isclosed():
            c = resp.read(1)
        delimited_string = c

        # read rest of delimiter length..
        d = ''
        while d != '\n' and self.running and not resp.isclosed():
            d = resp.read(1)
            delimited_string += d

        try:
            int_to_read = int(delimited_string)
            next_status_obj = resp.read( int_to_read )
            # print 'status_object = %s' % next_status_obj
            self._data(next_status_obj)
        except ValueError:
            pass 

    if resp.isclosed():
        self.on_closed(resp)

这个方案还需要学习如何下载tweepy包的源代码,进行修改,然后将修改后的库安装到python中.这是通过进入您的顶级 tweepy 目录并根据您的系统键入诸如 sudo setup.py install 之类的内容来完成的.

This solution also requires learning how to download the source code for the tweepy package, modifying it, and then installing the modified library into python. Which is done by going into your top level tweepy directory and typing something like sudo setup.py install depending on your system.

我还在 github 上对这个包的编码人员发表了评论,让他们知道发生了什么.

I've also commented to the coders on github for this package to let them know whats up.

这篇关于使用 tweepy 流式传输 api 只返回倒数第二条推文,而不是最后一条推文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆