Scrapy:为什么我的响应对象没有 body_as_unicode 方法? [英] Scrapy: why does my response object not have a body_as_unicode method?

查看:44
本文介绍了Scrapy:为什么我的响应对象没有 body_as_unicode 方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个蜘蛛,第一次就表现出色.我第二次尝试运行它时,它没有冒险超出 start_urls.我试图在 scrapy shellfetch url 并从返回的响应中创建一个 HtmlXPathSelector 对象.那是我收到错误的时候

I wrote a spider, that worked brilliantly the first time. The second time I tried to run it, it didn't venture beyond the start_urls. I tried to fetch the url in scrapy shell and create a HtmlXPathSelector object from the returned response. That is when I got the error

所以步骤是:`

[scrapy shell] fetch('http://example.com') #its something other than example.
[scrapy shell] from scrapy.selector import HtmlXPathSelector
[scrapy shell] hxs = HtmlXPathSelector(response)

---------------------------------------------------------------------------

追溯:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-a486208adf1e> in <module>()
----> 1 HtmlXPathSelector(response)

/home/codefreak/project-r42catalog/env-r42catalog/lib/python2.7/site-packages/scrapy/selector/lxmlsel.pyc in __init__(self, response, text, namespaces, _root, _expr)
     29                 body=unicode_to_str(text, 'utf-8'), encoding='utf-8')
     30         if response is not None:
---> 31             _root = LxmlDocument(response, self._parser)
     32 
     33         self.namespaces = namespaces

/home/codefreak/project-r42catalog/env-r42catalog/lib/python2.7/site-packages/scrapy/selector/lxmldocument.pyc in __new__(cls, response, parser)
     25         if parser not in cache:
     26             obj = object_ref.__new__(cls)
---> 27             cache[parser] = _factory(response, parser)
     28         return cache[parser]
     29 

/home/codefreak/project-r42catalog/env-r42catalog/lib/python2.7/site-packages/scrapy/selector/lxmldocument.pyc in _factory(response, parser_cls)
     11 def _factory(response, parser_cls):
     12     url = response.url
---> 13     body = response.body_as_unicode().strip().encode('utf8') or '<html/>'
     14     parser = parser_cls(recover=True, encoding='utf8')
     15     return etree.fromstring(body, parser=parser, base_url=url)

错误:

AttributeError: 'Response' object has no attribute 'body_as_unicode'

我是忽略了一些非常明显的东西还是偶然发现了scrapy 中的一个错误?

Am I overlooking something very obvious or stumbled upon a bug in scrapy?

推荐答案

body_as_unicodeTextResponse.如果 http 响应包含文本内容,TextResponse 或其子类之一(如 HtmlResponse)将由scrapy 创建.

body_as_unicode is a method of TextResponse. TextResponse, or one of its subclasses such as HtmlResponse, will be created by scrapy if the http response contains textual content.

In [1]: fetch('http://scrapy.org')
...
In [2]: type(response)
Out[2]: scrapy.http.response.html.HtmlResponse
...
In [3]: fetch('http://www.scrapy.org/site-media/images/logo.png')
...
In [4]: type(response)
Out[4]: scrapy.http.response.Response

在您的情况下,最可能的解释是scrapy 认为响应不包含文本.

In your case, the most likely explanation is that scrapy believes the response does not contain text.

来自服务器的 HTTP 响应是否正确设置了 Content-Type 标头?它是否在浏览器中正确呈现?这些问题将有助于了解这是预期行为还是错误.

Does the HTTP response from the server correctly set the Content-Type header? Does it render correctly in a browser? These questions will help understand if it's expected behavior or a bug.

这篇关于Scrapy:为什么我的响应对象没有 body_as_unicode 方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆