使用cookies进行scrapy身份验证登录 [英] scrapy authentication login with cookies
问题描述
我是scrapy的新手并决定尝试一下,因为在线评论很好.我正在尝试使用scrapy登录网站.我通过使用 selenium 收集所需的 cookie 并将它们添加到 mechanize 中,成功地使用 selenium 和 mechanize 的组合登录.现在我正在尝试用scrapy和selenium做类似的事情,但似乎无法得到任何工作.我什至无法判断是否有任何工作.谁能帮帮我吗.以下是我开始的内容.我什至可能不需要用scrapy传输cookie,但我不知道这东西是否真的登录过.谢谢
i am new to scrapy and decided to try it out because of good online reviews. I am trying to login to a website with scrapy. I have successfully logged in with a combination of selenium and mechanize by collecting the needed cookies with selenium and adding them to mechanize. Now I am trying to do something similar with scrapy and selenium but cant seem to get anything to work. I cant really even tell if anything is working or not. Can anyone please help me. Below is what Ive started on. I may not even need to transfer the cookies with scrapy but i cant tell if the thing ever actually logs in or not. Thanks
from scrapy.spider import BaseSpider
from scrapy.http import Response,FormRequest,Request
from scrapy.selector import HtmlXPathSelector
from selenium import webdriver
class MySpider(BaseSpider):
name = 'MySpider'
start_urls = ['http://my_domain.com/']
def get_cookies(self):
driver = webdriver.Firefox()
driver.implicitly_wait(30)
base_url = "http://www.my_domain.com/"
driver.get(base_url)
driver.find_element_by_name("USER").clear()
driver.find_element_by_name("USER").send_keys("my_username")
driver.find_element_by_name("PASSWORD").clear()
driver.find_element_by_name("PASSWORD").send_keys("my_password")
driver.find_element_by_name("submit").click()
cookies = driver.get_cookies()
driver.close()
return cookies
def parse(self, response,my_cookies=get_cookies):
return Request(url="http://my_domain.com/",
cookies=my_cookies,
callback=self.login)
def login(self,response):
return [FormRequest.from_response(response,
formname='login_form',
formdata={'USER': 'my_username', 'PASSWORD': 'my_password'},
callback=self.after_login)]
def after_login(self, response):
hxs = HtmlXPathSelector(response)
print hxs.select('/html/head/title').extract()
推荐答案
你的问题更多是调试问题,所以我的回答只会对你的问题做一些注释,而不是确切的答案.
Your question is more of debug issue, so my answer will have just some notes on your question, not the exact answer.
def parse(self, response,my_cookies=get_cookies):
return Request(url="http://my_domain.com/",
cookies=my_cookies,
callback=self.login)
my_cookies=get_cookies
- 您在此处分配一个函数,而不是它返回的结果.我认为您根本不需要在这里传递任何函数作为参数.应该是:
my_cookies=get_cookies
- you are assigning a function here, not the result it returns. I think you don't need to pass any function here as parameter at all. It should be:
def parse(self, response):
return Request(url="http://my_domain.com/",
cookies=self.get_cookies(),
callback=self.login)
cookies
Request
的参数应该是一个 dict - 请验证它确实是一个 dict.
cookies
argument for Request
should be a dict - please verify it is indeed a dict.
我什至无法判断是否有任何工作.
I cant really even tell if anything is working or not.
在回调中放置一些打印以跟踪执行.
Put some prints in the callbacks to follow the execution.
这篇关于使用cookies进行scrapy身份验证登录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!