为什么BeautifulSoup与“从未检索到任务异常"相关? [英] Why is BeautifulSoup related to 'Task exception was never retrieved'?

查看:64
本文介绍了为什么BeautifulSoup与“从未检索到任务异常"相关?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用协程来爬网和解析网页.我写了一个样本并进行测试.该程序可以在ubuntu 16.04的python 3.5中很好地运行,并且在完成所有工作后将退出.源代码如下.

I want to use the coroutine to crawl and parse webpages. I write a sample and test. The program could run well in python 3.5 in ubuntu 16.04 and it will quit when all the works have been done. The source code is below.

import aiohttp
import asyncio
from bs4 import BeautifulSoup

async def coro():
    coro_loop = asyncio.get_event_loop()
    url = u'https://www.python.org/'
    for _ in range(4):
        async with aiohttp.ClientSession(loop=coro_loop) as coro_session:
            with aiohttp.Timeout(30, loop=coro_session.loop):
                async with coro_session.get(url) as resp:
                    print('get response from url: %s' % url)
                    source_code = await resp.read()
                    soup = BeautifulSoup(source_code, 'lxml')

def main():
    loop = asyncio.get_event_loop()
    worker = loop.create_task(coro())
    try:
        loop.run_until_complete(worker)
    except KeyboardInterrupt:
        print ('keyboard interrupt')
        worker.cancel()
    finally:
        loop.stop()
        loop.run_forever()
        loop.close()

if __name__ == '__main__':
    main()

在测试时,我发现当我通过"Ctrl + C"关闭程序时,会出现错误从未检索到任务异常".

While testing, I find when I shut down the program by 'Ctrl+C', there will be a error 'Task exception was never retrieved'.

^Ckeyboard interrupt
Task exception was never retrieved
future: <Task finished coro=<coro() done, defined at ./test.py:8> exception=KeyboardInterrupt()>
Traceback (most recent call last):
  File "./test.py", line 23, in main
    loop.run_until_complete(worker)
  File "/usr/lib/python3.5/asyncio/base_events.py", line 375, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.5/asyncio/base_events.py", line 345, in run_forever
    self._run_once()
  File "/usr/lib/python3.5/asyncio/base_events.py", line 1312, in _run_once
    handle._run()
  File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.5/asyncio/tasks.py", line 307, in _wakeup
    self._step()
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "./test.py", line 17, in coro
    soup = BeautifulSoup(source_code, 'lxml')
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 215, in __init__
    self._feed()
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 239, in _feed
    self.builder.feed(self.markup)
  File "/usr/lib/python3/dist-packages/bs4/builder/_lxml.py", line 240, in feed
    self.parser.feed(markup)
  File "src/lxml/parser.pxi", line 1194, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:119773)
  File "src/lxml/parser.pxi", line 1316, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:119644)
  File "src/lxml/parsertarget.pxi", line 141, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:137264)
  File "src/lxml/parsertarget.pxi", line 135, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:137128)
  File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:11090)
  File "src/lxml/saxparser.pxi", line 499, in lxml.etree._handleSaxData (src/lxml/lxml.etree.c:131013)
  File "src/lxml/parsertarget.pxi", line 88, in lxml.etree._PythonSaxParserTarget._handleSaxData (src/lxml/lxml.etree.c:136397)
  File "/usr/lib/python3/dist-packages/bs4/builder/_lxml.py", line 206, in data
    def data(self, content):
KeyboardInterrupt

我浏览了官方文档python ,但没有任何线索.我尝试在coro()中捕获键盘中断.

I looked through the offical docs of python but haven't got a clue. I try to capture the Keyboard Interrupt in coro().

try:
    soup = BeautifulSoup(source_code, 'lxml')
except KeyboardInterrupt:
    print ('capture exception')
    raise

每次在BeautifulSoup()周围的"try/except"捕获KeyboardInterrupt时,都会发生错误.看来BeautifulSoup导致了该错误.但是如何解决呢?

Everytime the 'try/except' around BeautifulSoup() capture the KeyboardInterrupt, the error will occur. It seems that BeautifulSoup contribute to the error. But how to tackle it?

推荐答案

当您调用task.cancel()时,此函数实际上并不取消任务,而只是标记"要取消的任务.取消任务的实际过程将在任务恢复执行时开始. asyncio.CancelledError将立即在任务中引发,从而使其实际取消.任务将在此异常的情况下完成执行.

When you call task.cancel() this function doesn't actually cancel task, it just "mark" task to be cancelled. Actual process of cancelling task would be started when task will resume it's execution. asyncio.CancelledError will be immediately raised inside task forcing it to be actually cancelled. Task will finish it's execution with this exception.

另一方面,asyncio会警告您某些任务是否以异常方式静默完成(如果您没有检查任务执行的结果).

On the other hand asyncio warns you if some of your tasks finished with exception silently (if you didn't check result of task execution).

为避免出现问题,您应该等待任务取消收到asyncio.CancelledError(并且可能会取消显示,因为那时您不需要它):

To avoid problems you should await task cancellation receiving asyncio.CancelledError (and probably suppressing since you don't need it then):

import asyncio
from contextlib import suppress


async def coro():
    # ...

def main():
    loop = asyncio.get_event_loop()
    worker = asyncio.ensure_future(coro())
    try:
        loop.run_until_complete(worker)
    except KeyboardInterrupt:
        print('keyboard interrupt')

        worker.cancel()
        with suppress(asyncio.CancelledError):
            loop.run_until_complete(worker)  # await task cancellation.
    finally:
        loop.close()

if __name__ == '__main__':
    main()

这篇关于为什么BeautifulSoup与“从未检索到任务异常"相关?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆