如何在Scrappy中的Xpath中添加非ASCII字符 [英] how to add non-ascii characters in Xpath, in Scrappy

查看:81
本文介绍了如何在Scrappy中的Xpath中添加非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下Xpath:

bathroom = response.xpath(".//div[1][contains(., 'Baños’)]/text()").extract_first()

我收到此错误:

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

我已经尝试了其他类似问题中给出的解决方案:

I've tried the solutions given in these other similar questions:

过滤出python中的某些字节

Scrapy xpath utf-8文字

但没有一个解决我的问题!

but none has resolved my problem!

注意:使用第一个链接的解决方案,我显然用"c0>"代替了"input_string",但出现类似"的错误,该函数具有一个论点,给定2个... "

Note: with the solution of the first link, I obviously replaced the 'input_string' by let's say word = "baños", and I got an error like "the function has one argument, 2 given..."

任何人都可以帮忙吗?

推荐答案

除了文字Baños之外,您的代码段还包含无效的文字字符串定界符(单引号和双引号),这将导致不同的错误:

Besides the literal Baños, your code snippet contains invalid literal string delimiter (both single and double quotes) which will cause a different error :

bathroom = response.xpath(".//div[1][contains(., 'Baños’)]/text()").extract_first()
                          ^                            ^

按照第二个链接中的建议将整个XPath表达式转换为unicode,并修复上面指出的两个引号应该可以修复初始错误.下面是使用lxml的快速测试(引擎盖下容易被刮擦):

Converting the entire XPath expression to unicode, as suggested in the 2nd link, and fixing the two quotes pointed above should fix the initial errors. Below as a quick test using lxml (which scrapy uses under the hood) :

>>> from lxml import etree
>>> root = etree.fromstring('<root/>')
>>> root.xpath(u".//div[1][contains(., 'Baños')]/text()")
[]

这篇关于如何在Scrappy中的Xpath中添加非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆