异步比同步慢 [英] asynchronous slower than synchronous

查看:34
本文介绍了异步比同步慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序执行以下操作:

My program does the following:

  • 获取txt文件的文件夹
  • 对于每个文件:
    • 读取文件
    • 使用文件内容向本地主机中的 API 发出 POST 请求
    • 解析 XML 响应(不在下面的示例中)

    我关心程序同步版本的性能,所以尝试使用 aiohttp 使其异步(这是我除 Scrapy 之外第一次尝试在 Python 中进行异步编程).结果证明异步代码花费了 2 倍的时间,我不明白为什么.

    I was concerned with performance of synchronous version of the program so tried to use aiohttp to make it asynchronous (it's my first attempt of async programming in Python besides Scrapy). It turned out that the async code took 2 times longer and I don't understand why.

    同步代码(152 秒)

    SYNCHRONOUS CODE (152 seconds)

    url = "http://localhost:6090/api/analyzexml"
    package = #name of the package I send in each requests
    with open("template.txt", "r", encoding="utf-8") as f:
        template = f.read()
    
    articles_path = #location of my text files
    
    def fetch(session, url, article_text):
        data = {"package": package, "data": template.format(article_text)}
        response = session.post(url, data=json.dumps(data))
        print(response.text)
    
    files = glob(os.path.join(articles_path, "*.txt"))
    
    with requests.Session() as s:
        for file in files:
            with open(file, "r", encoding="utf-8") as f:
                    article_text = f.read()
            fetch(s, url, article_text)
    

    分析结果:

    +--------+---------+----------+---------+----------+-------------------------------------------------------+
    | ncalls | tottime | percall  | cumtime | percall  |               filename:lineno(function)               |
    +--------+---------+----------+---------+----------+-------------------------------------------------------+
    |    849 |   145.6 |   0.1715 |   145.6 |   0.1715 | ~:0(<method 'recv_into' of '_socket.socket' objects>) |
    |      2 |   1.001 |   0.5007 |   1.001 |   0.5007 | ~:0(<method 'connect' of '_socket.socket' objects>)   |
    |    365 |   0.772 | 0.002115 |   1.001 | 0.002742 | ~:0(<built-in method builtins.print>)                 |
    +--------+---------+----------+---------+----------+-------------------------------------------------------+
    

    (WANNABE)异步代码(327 秒)

    (WANNABE) ASYNCHRONOUS CODE (327 seconds)

    async def fetch(session, url, article_text):
        data = {"package": package, "data": template.format(article_text)}
        async with session.post(url, data=json.dumps(data)) as response:
            return await response.text()
    
    async def process_files(articles_path):
        tasks = []
    
        async with ClientSession() as session:
            files = glob(os.path.join(articles_path, "*.txt"))
            for file in files:
                with open(file, "r", encoding="utf-8") as f:
                    article_text = f.read()
                task = asyncio.ensure_future(fetch(session=session, 
                                            url=url, 
                                            article_text=article_text
                                            ))
                tasks.append(task)
                responses = await asyncio.gather(*tasks)
                print(responses)
    
    
    loop = asyncio.get_event_loop()
    future = asyncio.ensure_future(process_files(articles_path))
    loop.run_until_complete(future)
    

    分析结果:

     +--------+---------+---------+---------+---------+-----------------------------------------------+
        | ncalls | tottime | percall | cumtime | percall |           filename:lineno(function)           |
        +--------+---------+---------+---------+---------+-----------------------------------------------+
        |   2278 |     156 | 0.06849 |     156 | 0.06849 | ~:0(<built-in method select.select>)          |
        |    365 |   128.3 |  0.3516 |   168.9 |  0.4626 | ~:0(<built-in method builtins.print>)         |
        |    730 |   40.54 | 0.05553 |   40.54 | 0.05553 | ~:0(<built-in method _codecs.charmap_encode>) |
        +--------+---------+---------+---------+---------+-----------------------------------------------+
    

    我显然在这个概念中遗漏了一些东西.有人还可以帮助我理解为什么在异步版本中打印需要这么多时间(请参阅分析).

    I am clearly missing something in this concept. Could someone also help me understand why print in async version takes so much time (see profiling).

    推荐答案

    因为它不是异步的 :)

    Because it's not asynchronous :)

    看看你的代码:你做 responses = await asyncio.gather(*tasks) 对于每个文件,所以你基本上同步运行,每次支付全部协程处理的代价.

    Look at your code: you do responses = await asyncio.gather(*tasks) for every file, so you basically run fetching in sync, every time paying all the price of coroutine handling.

    我想这只是一个缩进错误;如果您取消缩进 responses = await asyncio.gather(*tasks) 以便它通过 for file in files 循环,您将真正启动 tasks并行.

    I suppose it's just an indentation error; if you unindent responses = await asyncio.gather(*tasks) so that it's past the for file in files loop, you will really start tasks in parallel.

    这篇关于异步比同步慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆