抓取多个帐户,即多次登录 [英] Scrape multiple accounts aka multiple logins

查看:37
本文介绍了抓取多个帐户,即多次登录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以成功抓取单个帐户的数据.我想在一个网站上抓取多个帐户,这意味着多次登录.如何管理登录/退出?

I can successfully scrape data for a single account. I want to scrape multiple accounts on a single website, which means multiple logins. How do I manage login/logout?

推荐答案

您可以在每个帐户会话中使用多个 cookiejar 并行抓取多个帐户,请参阅 http://doc.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=cookiejar#std:reqmeta-cookiejar

you can scrape multiples accounts in parallel using multiple cookiejars per account session, see "cookiejar" request meta key at http://doc.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=cookiejar#std:reqmeta-cookiejar

澄清:假设我们在 settings.py 中有一组帐户:

To clarify: suppose we have an array of accounts in settings.py:

MY_ACCOUNTS = [
    {'login': 'my_login_1', 'pwd': 'my_pwd_1'},
    {'login': 'my_login_2', 'pwd': 'my_pwd_2'},
]

这是登录页面的链接:http://example.com/login

在你的蜘蛛中创建 start_requests 函数,在这个函数中我们可以循环 MY_ACCOUNTS 数组并登录到每个帐户:

Create start_requestsfunction in your spider, in this function we can loop on the MY_ACCOUNTS array and login to each account:

def start_requests(self):
    requests = []

    for i, account in enumerate(self.crawler.settings['MY_ACCOUNTS']):
        request = FormRequest('http://example.com/login', 
            formdata={'form_login_name': account['login'], 'form_pwd_name': account['pwd']}, 
            callback=self.parse,
            dont_filter=True)

        request.meta['cookiejar'] = i
        requests.append(request)

    return requests

form_login_nameform_pwd_name 分别是登录表单上的字段名称.

form_login_name and form_pwd_name are respectively fields names on the login form.

dont_filter=True 用于忽略重复请求的过滤器,因为这里我们发出一个 POST 请求来登录同一页面 http://example.com/login

dont_filter=True For ignoring filter on duplicate requests, because here we make a POST request to login on the same page http://example.com/login

request.meta['cookiejar'] = i 分隔每个会话的 cookie(登录),不要忘记在您的子请求中添加 cookiejar 标识符,假设您登录后想将scrapy重定向到页面:

request.meta['cookiejar'] = i to separate cookies of each session(login), dont forget to add cookiejar identifier in your sub request, suppose you want to redirect scrapy to a page after login:

def parse(self, response): 
    """ make some manipulation here ... """

    yield Request(my_url, meta={'cookiejar': response.meta['cookiejar']}, callback = my_callback) 

这篇关于抓取多个帐户,即多次登录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆