Tweepy连接断开：IncompleteRead-处理异常的最佳方法？还是可以避免穿线？ [英] Tweepy Connection broken: IncompleteRead - best way to handle exception? or, can threading help avoid?

查看：172 发布时间：2020/6/10 23:23:33 python multithreading exception-handling tweepy

本文介绍了Tweepy连接断开：IncompleteRead-处理异常的最佳方法？还是可以避免穿线？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用tweepy处理大量的Twitter流（有4,000多个帐户）。我添加到信息流中的帐户越多，我越有可能出现此错误：

I am using tweepy to handle a large twitter stream (following 4,000+ accounts). The more accounts that I add to the stream, the more likely I am to get this error:

Traceback (most recent call last):
  File "myscript.py", line 2103, in <module>
main()
  File "myscript.py", line 2091, in main
    twitter_stream.filter(follow=USERS_TO_FOLLOW_STRING_LIST,     stall_warnings=True)
  File "C:\Python27\lib\site-packages\tweepy\streaming.py", line 445, in filter
self._start(async)
  File "C:\Python27\lib\site-packages\tweepy\streaming.py", line 361, in _start
self._run()
  File "C:\Python27\lib\site-packages\tweepy\streaming.py", line 294, in _run
raise exception
requests.packages.urllib3.exceptions.ProtocolError: ('Connection broken:     IncompleteRead(0 bytes read, 2000 more expected)', IncompleteRead(0 bytes read, 2000 more expected))

很明显，这是一个很厚的火喉-凭经验显然，这是难以处理。基于对堆栈溢出错误的研究以及我要添加的帐户越多，出现此异常的速度越快的经验趋势，我的假设是这是我的错。我对每条推文的处理时间太长和/或我的firehose太厚。我明白了。

Obviously that is a thick firehose - empirically obviously, it's too thick to handle. Based on researching this error on stackoverflow as well as the empirical trend that 'the more accounts to follow I add, the faster this exception occurs', my hypothesis is that this is 'my fault'. My processing of each tweet takes too long and/or my firehose is too thick. I get that.

但是尽管进行了这种设置，但我仍然有两个问题似乎找不到可靠的答案。

1。有没有一种方法可以简单地处理此异常，接受我会错过一些推文，但保持脚本运行？我认为也许错过了一条Tweet（或许多Tweet'），但是如果我可以在没有100％我想要的Tweet的情况下生活，那么脚本/流仍然可以继续，随时可以捕获下一条Tweet。

But notwithstanding that setup, I still have two questions that I can't seem to find solid answers for.
1. Is there a way to simply 'handle' this exception, accept that I will miss some tweets, but keep the script running? I figure maybe it misses a tweet (or many tweets', but if I can live without 100% of the tweets I want, then the script/stream can still go on, ready to catch the next tweet whenever it can.

我已经尝试过这种异常处理，在类似的stackoverflow问题中对此建议使用：urllib3.exceptions中的
导入ProtocolError

I've tried this exception handling, which was recommended for that in a similar question on stackoverflow: from urllib3.exceptions import ProtocolError

    while True:
        try:
            twitter_stream.filter(follow=USERS_TO_FOLLOW_STRING_LIST, stall_warnings=True)

        except ProtocolError:
            continue

但是对我来说很不幸，（也许我执行不正确，但是

But unfortunately for me, (perhaps I implemented it incorrectly, but I don't think I did), that did not work. I get the same exact error I was previously getting with or without that recommended exception handling code in place.

我从未在我的python代码中实现队列和/或线程。这对m来说是个好时机吗？ e尝试实现该目标？我对队列/线程一无所知，但我在想... ...

我可以写些鸣叫吗？在原始（预处理）到一个线程的内存，数据库或其他东西上？然后，准备好第二个线程准备好处理那些推文吗？我认为，至少，将推文的后处理排除在等式之外，这是我正在读取的消防水带带宽的限制因素。然后，如果仍然出现错误，我可以减少关注的对象，等等。

Could I have the tweets sort of written - in the raw - pre-processing - to memory, or a database, or something, on one thread? And then, have a second thread ready to do the processing of those tweets, as soon as it's ready? I figure that way, at least, it takes my post-processing of the tweet out of the equation as a limiting factor on the bandwidth of the firehose I am reading. Then if I still get the error I can cut back on who I am following, etc.

我看了一些线程教程，但认为可能值得问一问是否可行 '与...这个tweepy / twitter / etc /复杂。我对自己所遇到的问题或线程的帮助方式不甚了解，因此我想请教一下有关确实对我有帮助的建议。

I have watched some threading tutorials but figured might be worth asking if that 'works' with ... this tweepy/twitter/etc/ complex. I am not confident in my understanding of the problem I have or how threading might help, so figured I could ask for advice as to if indeed that would help me here.

如果这个想法是正确的，那么有人可以帮助我指出正确的方向吗？ $ b

If this idea is valid, is there a sort of simple piece of example code someone could help me with to point me in the right direction?

推荐答案

我认为我终于完成了第一个队列/线程实现，从而解决了这个问题。我还没有足够的知识来了解执行此操作的最佳方法，但是我认为这种方法确实有效。使用下面的代码，我现在建立了一个新的推文队列，可以按我希望的顺序处理它们，而不是落后并失去与tweepy的连接。

I think i solved this problem by finally completing my first queue/thread implementation. I am not learned enough to know the best way to do this, but I think this way does work. Using the below code I now build up a queue of new tweets and can handle them as I wish in the queue, rather than falling behind and losing my connection with tweepy.

from Queue import Queue
from threading import Thread 

class My_Parser(tweepy.StreamListener):

    def __init__(self, q = Queue()):

        num_worker_threads = 4
        self.q = q
        for i in range(num_worker_threads):
             t = Thread(target=self.do_stuff)
             t.daemon = True
             t.start()

    def on_data(self, data):

        self.q.put(data)


    def do_stuff(self):
        while True:

            do_whatever(self.q.get())


            self.q.task_done()

我确实继续挖掘了一段时间关于IncompleteRead错误，我尝试了更多使用url libs和http l的异常处理解决方案ibs，但我为此感到挣扎。而且我认为除了保持连接之外，排队的东西还是有一些好处的（对于其中一个，不会丢失数据）。

I did continue digging for a while about the IncompleteRead error and I tried numerous more Exception handlings solutions using url libs and http libs but I struggled with that. And I think there may be some benefits to the queueing stuff anyway outside of just keeping the connection (for one, won't lose data).

希望这对某人。哈哈。

这篇关于Tweepy连接断开：IncompleteRead-处理异常的最佳方法？还是可以避免穿线？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Tweepy连接断开：IncompleteRead-处理异常的最佳方法？还是可以避免穿线？ [英] Tweepy Connection broken: IncompleteRead - best way to handle exception? or, can threading help avoid?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Tweepy连接断开：IncompleteRead-处理异常的最佳方法？还是可以避免穿线？ [英] Tweepy Connection broken: IncompleteRead - best way to handle exception? or, can threading help avoid?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭