Twitter Streaming API - urllib3.exceptions.ProtocolError: ('连接中断:IncompleteRead [英] Twitter Streaming API - urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead

查看:79
本文介绍了Twitter Streaming API - urllib3.exceptions.ProtocolError: ('连接中断:IncompleteRead的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 tweepy 运行 python 脚本,该脚本在英语推文的随机样本中流式传输(使用 twitter 流 API)一分钟,然后交替搜索(使用 twitter 搜索 API)一分钟,然后返回.我发现的问题是,大约 40 多秒后,流媒体崩溃并出现以下错误:

Running a python script using tweepy which streams (using the twitter streaming API) in a random sample of english tweets, for a minute and then alternates to searching (using the twitter searching API) for a minute and then returns. Issue I've found is that after about 40+ seconds the streaming crashes and gives the following error:

完全错误:

urllib3.exceptions.ProtocolError: ('连接中断:IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

读取的字节数从 0 到 1000 不等.

The amount of bytes read can vary from 0 to well in the 1000's.

第一次看到流过早中断,搜索功能提前启动,搜索功能完成后,它再次返回流,第二次再次出现此错误时,代码崩溃.

The first time this is seen the streaming cuts out prematurely and the search function starts early, after the search function is done it comes back to the stream once again and on the second recurrence of this error the code crashes.

我正在运行的代码是:

# Handles date time calculation
def calculateTweetDateTime(tweet):
    tweetDateTime = str(tweet.created_at)

    tweetDateTime = ciso8601.parse_datetime(tweetDateTime)
    time.mktime(tweetDateTime.timetuple())
    return tweetDateTime

# Checks to see whether that permitted time has past.
def hasTimeThresholdPast():
    global startTime
    if time.clock() - startTime > 60:
        return True
    else:
        return False

#override tweepy.StreamListener to add logic to on_status
class StreamListener(StreamListener):

    def on_status(self, tweet):
        if hasTimeThresholdPast():
            return False

        if hasattr(tweet, 'lang'):
            if tweet.lang == 'en':

                try:
                    tweetText = tweet.extended_tweet["full_text"]
                except AttributeError:
                    tweetText = tweet.text

                tweetDateTime = calculateTweetDateTime(tweet)

                entityList = DataProcessing.identifyEntities(True, tweetText)
                DataStorage.storeHotTerm(entityList, tweetDateTime)
                DataStorage.storeTweet(tweet)


    def on_error(self, status_code):
        def on_error(self, status_code):
            if status_code == 420:
                # returning False in on_data disconnects the stream
                return False


def startTwitterStream():

    searchTerms = []

    myStreamListener = StreamListener()
    twitterStream = Stream(auth=api.auth, listener=StreamListener())
    global geoGatheringTag
    if geoGatheringTag == False:
        twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an'], async=True, stall_warnings=True)

    if geoGatheringTag == True:
        twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an', 'they\'re'],
                             async=False, locations=[-4.5091, 55.7562, -3.9814, 55.9563], stall_warnings=True)



# ----------------------- Twitter API Functions ------------------------
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# --------------------------- Main Function ----------------------------

startTime = 0


def main():
    global startTime
    userInput = ""
    userInput.lower()
    while userInput != "-1":
        userInput = input("Type ACTiVATE to activate the Crawler, or DATABASE to access data analytic option (-1 to exit): \n")
        if userInput.lower() == 'activate':
            while(True):
                startTime = time.clock()

                startTwitterStream()

                startTime = time.clock()
                startTwitterSearchAPI()

if __name__ == '__main__':
    main() 

我删除了搜索功能和数据库处理方面,因为它们是独立的,并且避免了代码混乱.

I've trimmed out the search function, and database handling aspects given they're seperate and to avoid cluttering up the code.

如果有人知道为什么会发生这种情况以及我如何解决它,请告诉我,我对任何见解都很好奇.

If anyone has any ideas why this is happening and how I might solve it please let me know, I'd be curious on any insight.

我尝试过的解决方案:
带有 http.client.IncompleteRead:
的 Try/Except 块根据 Error-while-fetching-tweets-with-tweepy

将 Stall_Warning = 设置为 True:
根据 Incompleteread-error-when-retrieving-twitter-data-using-蟒蛇

Setting Stall_Warning = to True:
As per Incompleteread-error-when-retrieving-twitter-data-using-python

删除英语语言过滤器.

推荐答案

已解决.

对于那些好奇或遇到类似问题的人:经过一些实验,我发现传入推文的积压是问题所在.每次系统收到一条推文时,我的系统都会运行一个实体识别和存储过程,这花费了一小部分时间,随着收集数百到数千条推文的时间,这个积压越来越大,直到 API 无法处理它,抛出那个错误.

To those curious or who are experiencing a similar issue: after some experimentation I've discovered the backlog of incoming tweets was the issue. Every time the system recieves a tweet my system ran a process of entity identification and storing which cost a small piece of time and over the time of gathering several hundred to thousand tweets this backlog grew larger and larger until the API couldn't handle it and threw up that error.

解决方案: 将您的on_status/on_data/on_success"函数剥离为基本要素,并在流会话关闭后单独处理任何计算,即存储或实体识别.或者,您可以使您的计算更加高效,并使时间间隔变得不明显,由您决定.

Solution: Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.

这篇关于Twitter Streaming API - urllib3.exceptions.ProtocolError: ('连接中断:IncompleteRead的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆