Scrapy XPath UTF-8文字 [英] Scrapy xpath utf-8 literals

查看：271 发布时间：2020/7/13 5:51:52 python unicode utf-8 scrapy

本文介绍了Scrapy XPath UTF-8文字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要检查包含非ASCII字符的抓取字段.当我在Spider中包含utf-8文字时，会出现以下错误:

I need to check scraped fields which contain non-ascii characters. When I include a utf-8 literal in the spider, I get this error:

ValueError:所有字符串必须与XML兼容:Unicode或ASCII，没有NULL字节或控制字符

以下是产生错误的示例

# -*- coding: utf-8 -*-
import scrapy

class DummySpider(scrapy.Spider):
    name = 'dummy'
    start_urls = ['http://www.google.com']

    def parse(self, response):
        dummy = response.xpath("//*[contains(.,u'café')]")

这是回溯:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/tmp/stack.py", line 9, in parse
    dummy = response.xpath("//*[contains(.,u'café')]")
  File "/usr/lib/pymodules/python2.7/scrapy/http/response/text.py", line 109, in xpath
    return self.selector.xpath(query)
  File "/usr/lib/pymodules/python2.7/scrapy/selector/unified.py", line 97, in xpath
    smart_strings=self._lxml_smart_strings)
  File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:50702)
  File "xpath.pxi", line 306, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145829)
  File "apihelpers.pxi", line 1395, in lxml.etree._utf8 (src/lxml/lxml.etree.c:26485)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Scrapy XPath UTF-8文字 [英] Scrapy xpath utf-8 literals

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy XPath UTF-8文字 [英] Scrapy xpath utf-8 literals

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭