Aiohttp 异步会话请求 [英] Aiohttp async session requests

查看:30
本文介绍了Aiohttp 异步会话请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我一直在用请求抓取网站 (www.cardsphere.com) 受保护的页面,使用会话,如下所示:

So i've been scraping a website (www.cardsphere.com) protected pages with requests, using session, like so:

import requests

payload = {
            'email': <enter-email-here>,
            'password': <enter-site-password-here>
          }

with requests.Session() as request:
   requests.get(<site-login-page>)
   request.post(<site-login-here>, data=payload)
   request.get(<site-protected-page1>)
   save-stuff-from-page1
   request.get(<site-protected-page2>)
   save-stuff-from-page2
   .
   .
   .
   request.get(<site-protected-pageN>)
   save-stuff-from-pageN
the-end

现在因为它有相当多的页面,我想用 Aiohttp + asyncio 加速它......但我错过了一些东西.我已经能够或多或少地使用它来废弃未受保护的页面,如下所示:

Now since it's quite a bit of pages i wanted to speed it up with Aiohttp + asyncio...but i'm missing something. I've been able to more or less use it to scrap unprotected pages, like so:

import asyncio
import aiohttp

async def get_cards(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            data = await resp.text()
            <do-stuff-with-data>

urls  = [
         'https://www.<url1>.com'
         'https://www.<url2>.com'
         .
         .
         . 
         'https://www.<urlN>.com'
        ]

loop = asyncio.get_event_loop()
loop.run_until_complete(
    asyncio.gather(
        *(get_cards(url) for url in urls)
    )
)

这给出了一些结果,但是对于需要登录的页面我该如何做呢?我尝试在 async 函数中添加 session.post(,data=payload) ,但显然效果不佳,它只会继续登录.有没有办法在循环函数之前设置"一个 aiohttp ClientSession?因为我需要先登录,然后在同一个会话中,使用 asyncio + aiohttp 从一堆受保护的链接中获取数据?

That gave some results but how do i do it for pages that require login? I tried adding session.post(<login-url>,data=payload) inside the async function but that obviously didn't work out well, it will just keep logging in. Is there a way to "set" an aiohttp ClientSession before the loop function? As i need to login first and then, on the same session, get data from a bunch of protected links with asyncio + aiohttp?

对 python 来说还是比较新的,异步更是如此,我在这里遗漏了一些关键概念.如果有人能指出我正确的方向,我将不胜感激.

Still rather new to python, async even more so, i'm missing some key concept here. If anybody would point me in the right direction i'll greatly appreciate it.

推荐答案

这是我能想到的最简单的,取决于你在 你可能会遇到一些关于并发的其他问题,在你去的兔子洞里......开个玩笑,把你的脑袋围绕在 coros、promise 和任务上有点复杂,但是一旦你明白它就像顺序编程一样简单

This is the simplest I can come up with, depending on what you do in <do-stuff-with-data> you may run into some other troubles regarding concurrency, down the rabbit hole you go... just kidding, its a little bit more complicated to wrap your head around coros and promises and tasks but once you get it is as simple as sequential programming

import asyncio
import aiohttp


async def get_cards(url, session, sem):
    async with sem, session.get(url) as resp:
        data = await resp.text()
        # <do-stuff-with-data>


urls = [
    'https://www.<url1>.com',
    'https://www.<url2>.com',
    'https://www.<urlN>.com'
]


async def main():
    sem = asyncio.Semaphore(100)
    async with aiohttp.ClientSession() as session:
        await session.get('auth_url')
        await session.post('auth_url', data={'user': None, 'pass': None})
        tasks = [asyncio.create_task(get_cards(url, session, sem)) for url in urls]
        results = await asyncio.gather(*tasks)
        return results


asyncio.run(main())

这篇关于Aiohttp 异步会话请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆