在 asyncio run_until_complete() 语句完成后运行代码 [英] Run code after asyncio run_until_complete() statement has finished

查看:220
本文介绍了在 asyncio run_until_complete() 语句完成后运行代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 asyncio 还很陌生,我设法用它做了一些请求.我创建了一个函数 fetch_all(),它接收查询列表(URL)和之前使用 asyncio 创建的循环作为参数,并调用函数 fetch() 以 JSON 格式获取每个查询的结果:

I am fairly new to asyncio and I managed to do some requests with it. I made a function fetch_all() that takes in a list of the queries (URLs) and the loop previously created with asyncio as arguments, and calls the function fetch() that gets the result of each query in JSON format:

import aiohttp
import asyncio
import ssl
import nest_asyncio
nest_asyncio.apply()

async def fetch(session, url):
    async with session.get(url, ssl=ssl.SSLContext()) as response:
        return await response.json()

async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        return await asyncio.gather(*[fetch(session, url) for url in urls], return_exceptions=True)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(queries, loop))

这工作正常,我在 results 中获得查询结果作为 JSON 列表(字典).但这里出现了我的问题:有时,我会收到一些结果的错误,而不是 JSON(RuntimeErroraiohttp.client_exceptions.ClientConnectorError 等).我想这些是一次性错误,因为如果我单独重做查询,我会得到所需的结果.因此,我想出了一个 while 循环来检查哪些结果不是字典并重做它们的查询:我初始化 repeat_querieserror_indexresults 与查询及其索引,并应用 run_until_complete().然后我保存作为字典的每个结果并更新剩下的查询列表及其索引:

This works properly, and I get the results of the queries in results as a list of JSONs (dictionaries). But here goes my problem: sometimes, instead of the JSON, I get an error for some results (RuntimeError, aiohttp.client_exceptions.ClientConnectorError, etc.). I guess these are one-time errors, since if I redo the query individually I get the desired result. Hence, I came up with a while loop to check which results are not dictionaries and redo their queries: I initialize repeat_queries, error_index and results with the queries and their indices, and apply run_until_complete(). Then I save each result that is a dictionary and update the list of the queries that are left and their indices:

repeat_queries = queries
error_index = list(range(len(repeat_queries)))
results = error_index

while error_index:
    if __name__ == '__main__':
        loop = asyncio.get_event_loop()
        repeat_results = loop.run_until_complete(fetch_all(repeat_queries, loop))
    for i, rr in zip(error_index, repeat_results):
        results[i] = rr
    error_index = [i for i in range(len(results)) if not isinstance(results[i], dict)]
    repeat_queries = [repeat_queries[i] for i in error_index]

然而,由于 asyncio 循环是异步的,error_indexrepeat_queries 更新在 run_until_complete() 之前执行完成,并且循环继续运行之前迭代中已经转换的查询,导致(几乎)无限的 while 循环.

However, since the asyncio loop is asynchronous, error_index and repeat_queries updates are executed before run_until_complete() is done, and the loop is continuously running with queries that were already cast in the previous iterations, resulting in an (almost) infinite while loop.

因此,我的问题是:
loop.run_until_complete() 完成后,有没有办法强制执行某些代码?
我在 stackoverflow 中看到了一些类似的问题,但我无法应用他们的任何答案.

Therefore, my question is:
Is there any way to force some code to be executed after loop.run_until_complete() has finished?
I have seen some similar questions in stackoverflow but I haven't been able to apply any of their answers.

推荐答案

我会以不同的方式做到这一点.

I would do this in different way.

我会在 fetch() 中使用 try/except 运行循环以捕获异常并重复它.

I would run loop inside fetch() with try/except to catch exception and repeate it.

因为有些问题永远无法给出结果,所以 while-loop 可能会永远运行 - 所以我宁愿使用 for _ in range(3) 只尝试 3 次.

Because some problems can never give result so while-loop may run forever - so I would rather use for _ in range(3) to try it only three times.

我也会从 fetch 返回 url 以便更容易获取不给出结果的 url.

I would also return url from fetch so it would be easer to get urls which don't give result.

import aiohttp
import asyncio
import ssl

async def fetch(session, url):
    exception = None
    
    for number in range(3):  # try only 3 times
        try:
            async with session.get(url, ssl=ssl.SSLContext()) as response:
                data = await response.json()
                #print('data:', data)
                return url, data
        except Exception as ex:
            print('[ERROR] {} | {} | {}'.format(url, number+1, ex))
            exception = ex
            
    return url, exception

async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        return await asyncio.gather(*[fetch(session, url) for url in urls], return_exceptions=True)


queries = [
    'https://httpbin.org/get',
    'https://toscrape.com',
    'https://fake.domain/'
]

if __name__ == '__main__':
    
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(queries, loop))

    #print(results)
    
    print('--- results ---')
    
    for url, result in results:
        print('url:', url)
        print('result:', result)
        print('is dict:', isinstance(result, dict))
        print('type:', type(result))
        print('---')

结果:

[ERROR] https://fake.domain/ | 1 | Cannot connect to host fake.domain:443 ssl:<ssl.SSLContext object at 0x7f3902afc2c0> [Name or service not known]
[ERROR] https://fake.domain/ | 2 | Cannot connect to host fake.domain:443 ssl:<ssl.SSLContext object at 0x7f3902afc440> [Name or service not known]
[ERROR] https://fake.domain/ | 3 | Cannot connect to host fake.domain:443 ssl:<ssl.SSLContext object at 0x7f3902afc9c0> [Name or service not known]
[ERROR] https://toscrape.com | 1 | 0, message='Attempt to decode JSON with unexpected mimetype: text/html', url=URL('https://toscrape.com')
[ERROR] https://toscrape.com | 2 | 0, message='Attempt to decode JSON with unexpected mimetype: text/html', url=URL('https://toscrape.com')
[ERROR] https://toscrape.com | 3 | 0, message='Attempt to decode JSON with unexpected mimetype: text/html', url=URL('https://toscrape.com')
--- results ---
url: https://httpbin.org/get
result: {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'Python/3.8 aiohttp/3.7.4.post0', 'X-Amzn-Trace-Id': 'Root=1-60e5c00e-45aae85e78277e5122b262c9'}, 'origin': '83.11.175.159', 'url': 'https://httpbin.org/get'}
is dict: True
type: <class 'dict'>
---
url: https://toscrape.com
result: 0, message='Attempt to decode JSON with unexpected mimetype: text/html', url=URL('https://toscrape.com')
is dict: False
type: <class 'aiohttp.client_exceptions.ContentTypeError'>
---
url: https://fake.domain/
result: Cannot connect to host fake.domain:443 ssl:<ssl.SSLContext object at 0x7f3902afc9c0> [Name or service not known]
is dict: False
type: <class 'aiohttp.client_exceptions.ClientConnectorError'>
---


使用您的方法循环run_until_complete的版本,但我会在一个for-loop中完成所有操作.

Version which uses your method with looping run_until_complete but I would do all in one for-loop.

我会使用 for _ in range(3) 只重复 3 次.

And I would use for _ in range(3) to repeate it only three times.

这有效,但以前的版本似乎更简单.

This works but previous version seems much simpler.

import aiohttp
import asyncio
import ssl

async def fetch(session, url):
    async with session.get(url, ssl=ssl.SSLContext()) as response:
        return await response.json()

async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        return await asyncio.gather(*[fetch(session, url) for url in urls], return_exceptions=True)

queries = [
    'https://httpbin.org/get',
    'https://httpbin.org/json',
    'https://toscrape.com',
    'https://fake.domain/'
]

if __name__ == '__main__':

    # you can get it once
    loop = asyncio.get_event_loop()

    # original all queries
    all_queries = queries
    # places for all results  
    all_results = [None] * len(all_queries)
    
    # selected indexes at start
    indexes = list(range(len(all_queries)))
        
    for number in range(3):
        # selected queries
        queries = [all_queries[idx] for idx in indexes]
        
        # selected results
        results = loop.run_until_complete(fetch_all(queries, loop))
        
        print('\n--- try:', number+1, '--- results:', len(results), '---\n')
        
        new_indexes = []
        
        for idx, url, result in zip(indexes, queries, results):
            all_results[idx] = result
            if not isinstance(result, dict):
                new_indexes.append(idx)

            print('url:', url)
            print('result:', result)    
            print('is dict:', isinstance(result, dict))
            print('type:', type(result))
            print('---')
                
        # selected indexes after fitering correct results
        indexes = new_indexes             

这篇关于在 asyncio run_until_complete() 语句完成后运行代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆