如何在Scrappy中的Xpath中添加非ASCII字符 [英] how to add non-ascii characters in Xpath, in Scrappy
问题描述
我有以下Xpath:
bathroom = response.xpath(".//div[1][contains(., 'Baños’)]/text()").extract_first()
我收到此错误:
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
我已经尝试了其他类似问题中给出的解决方案:
I've tried the solutions given in these other similar questions:
但没有一个解决我的问题!
but none has resolved my problem!
注意:使用第一个链接的解决方案,我显然用"c0>"代替了"input_string",但出现类似"的错误,该函数具有一个论点,给定2个... "
Note: with the solution of the first link, I obviously replaced the 'input_string' by let's say word = "baños"
, and I got an error like "the function has one argument, 2 given..."
任何人都可以帮忙吗?
推荐答案
除了文字Baños
之外,您的代码段还包含无效的文字字符串定界符(单引号和双引号),这将导致不同的错误:
Besides the literal Baños
, your code snippet contains invalid literal string delimiter (both single and double quotes) which will cause a different error :
bathroom = response.xpath(".//div[1][contains(., 'Baños’)]/text()").extract_first()
^ ^
按照第二个链接中的建议将整个XPath表达式转换为unicode,并修复上面指出的两个引号应该可以修复初始错误.下面是使用lxml
的快速测试(引擎盖下容易被刮擦):
Converting the entire XPath expression to unicode, as suggested in the 2nd link, and fixing the two quotes pointed above should fix the initial errors. Below as a quick test using lxml
(which scrapy uses under the hood) :
>>> from lxml import etree
>>> root = etree.fromstring('<root/>')
>>> root.xpath(u".//div[1][contains(., 'Baños')]/text()")
[]
这篇关于如何在Scrappy中的Xpath中添加非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!