错误:尝试使用scrapy登录时出现引发ValueError(“No element found in %s" % response) [英] Error: raise ValueError("No element found in %s" % response) occur when try to login with scrappy

查看:58
本文介绍了错误:尝试使用scrapy登录时出现引发ValueError(“No element found in %s" % response)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从我大学的 bbs 中抓取一些信息.地址在这里:http://bbs.byr.cn下面是我的蜘蛛的代码:

I want to crawl some info from the bbs of my college. Here is the address:http://bbs.byr.cn Below is the code of my spider:

from lxml import etree
import scrapy
try:
from scrapy.spiders import Spider
except:
from scrapy.spiders import BaseSpider as Spider
from scrapy.http import Request

class ITJobInfoSpider(scrapy.Spider):
name = "ITJobInfoSpider"
start_urls = ["http://bbs.byr.cn/#!login"]

def parse(self,response):
    return scrapy.FormRequest.from_response(
        response,
        formdata={'method':'post','id': 'username', 'passwd':'password'},
        formxpath='//form[@action="/login"]',
        callback=self.after_login
)

def after_login(self,response):
    print "######response body: " + response.body +"\n"
    if "authentication failed" in response.body:
        print "#######Login failed#########\n"
    return

但是,使用此代码,我经常收到错误: raise ValueError("No element found in %s" % response)

However, with this code, I often get an Error: raise ValueError("No element found in %s" % response)

我发现当scrapy尝试解析url的HTML代码时会发生这个错误:http://bbs.byr.cn,scrapy用lxml解析页面.下面是代码

I find that this Error happens when scrapy try to parse the HTML code of the url: http://bbs.byr.cn, scrappy parses the page with lxml. Below is the code

root = LxmlDocument(response, lxml.html.HTMLParser)
forms = root.xpath('//form')
if not forms:
    raise ValueError("No <form> element found in %s" % response)

所以我用代码查看代码:打印 etree.tostring(root)并发现HTML元素:</form>被解析为&lt;/form>难怪代码 forms = root.xpath('//form') 会返回一个空的表单列表.

So I look into the code with the code: print etree.tostring(root) And find that HTML element:</form>is parsed into &lt;/form&gt; no wonder the code forms = root.xpath('//form') will return an empty forms list.

但我不知道为什么会发生这种情况,也许是 HTML 代码编码?(HTML 代码使用 GBK 而非 UTF8 编码.)提前感谢任何可以帮助我的人?顺便说一句,如果有人想针对网站编写代码,我可以给你一个测试帐户,请在评论中给我留下一个电子邮件地址.

But I don't know why this is happening, maybe the HTML code encoding? (The HTML code is encoded with GBK not UTF8.) Thanks advance for anyone who can help me out? BTW, if anyone want to write code against the website, I can give you an test account, pls leave me an email address in the comment.

非常感谢,伙计们!!

推荐答案

似乎发生了一些 JavaScript 重定向.

There seems to be some JavaScript redirection happening.

在这种情况下,使用 Splash 会有点过头了.只需将 /index 附加到起始 URL:http://bbs.byr.cn → http://bbs.byr.cn/index

In this case using Splash would be overkill, though. Simply append /index to the start URL: http://bbs.byr.cn → http://bbs.byr.cn/index

这将是完整的工作蜘蛛:

This would be the complete working spider:

from scrapy import Spider
from scrapy.http import FormRequest

class ByrSpider(Spider):
    name = 'byr'
    start_urls = ['http://bbs.byr.cn/index']

    def parse(self, response):
        return FormRequest.from_response(
            response,
            formdata={'method':'post','id': 'username', 'passwd':'password'},
            formxpath='//form[@action="/login"]',
            callback=self.after_login)

    def after_login(self, response):
        self.logger.debug(response.text)
        if 'authentication failed' in response.text:
            self.logger.debug('Login failed')

这篇关于错误:尝试使用scrapy登录时出现引发ValueError(“No element found in %s" % response)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆