在scrapy中使用loginform [英] Using loginform with scrapy
问题描述
scrapy 框架(https://github.com/scrapy/scrapy)为在登录需要身份验证的网站时使用,https://github.com/scrapy/loginform.
我已经查看了这两个程序的文档,但是我似乎无法弄清楚如何在运行之前让 scrapy 调用 loginform.仅使用 loginform 即可正常登录.
谢谢
The scrapy framework (https://github.com/scrapy/scrapy) provides a library for use when logging into websites that require authentication, https://github.com/scrapy/loginform.
I have looked through the docs for both programs however I cannot seem to figure out how to get scrapy to call loginform before running. The login works fine with just loginform.
Thanks
推荐答案
loginform
只是一个库,与 Scrapy 完全解耦.
loginform
is just a library, totally decoupled from Scrapy.
您必须编写代码以将其插入您想要的蜘蛛中,可能是在回调方法中.
You have to write the code to plug it in the spider you want, probably in a callback method.
以下是执行此操作的结构示例:
Here is an example of a structure to do this:
import scrapy
from loginform import fill_login_form
class MySpiderWithLogin(scrapy.Spider):
name = 'my-spider'
start_urls = [
'http://somewebsite.com/some-login-protected-page',
'http://somewebsite.com/another-protected-page',
]
login_url = 'http://somewebsite.com/login-page'
login_user = 'your-username'
login_password = 'secret-password-here'
def start_requests(self):
# let's start by sending a first request to login page
yield scrapy.Request(self.login_url, self.parse_login)
def parse_login(self, response):
# got the login page, let's fill the login form...
data, url, method = fill_login_form(response.url, response.body,
self.login_user, self.login_password)
# ... and send a request with our login data
return scrapy.FormRequest(url, formdata=dict(data),
method=method, callback=self.start_crawl)
def start_crawl(self, response):
# OK, we're in, let's start crawling the protected pages
for url in self.start_urls:
yield scrapy.Request(url)
def parse(self, response):
# do stuff with the logged in response
这篇关于在scrapy中使用loginform的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!