如何在Scrappy中的Xpath中添加非ASCII字符 [英] how to add non-ascii characters in Xpath, in Scrappy

查看：81 发布时间：2020/9/7 20:38:23 python xpath unicode ascii

本文介绍了如何在Scrappy中的Xpath中添加非ASCII字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下Xpath:

bathroom = response.xpath(".//div[1][contains(., 'Baños’)]/text()").extract_first()

我收到此错误:

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

我已经尝试了其他类似问题中给出的解决方案:

I've tried the solutions given in these other similar questions:

过滤出python中的某些字节

Scrapy xpath utf-8文字

但没有一个解决我的问题！

but none has resolved my problem!

注意:使用第一个链接的解决方案，我显然用"c0>"代替了"input_string"，但出现类似"的错误，该函数具有一个论点，给定2个... "

Note: with the solution of the first link, I obviously replaced the 'input_string' by let's say word = "baños", and I got an error like "the function has one argument, 2 given..."

任何人都可以帮忙吗?

推荐答案

除了文字Baños之外，您的代码段还包含无效的文字字符串定界符(单引号和双引号)，这将导致不同的错误:

Besides the literal Baños, your code snippet contains invalid literal string delimiter (both single and double quotes) which will cause a different error :

bathroom = response.xpath(".//div[1][contains(., 'Baños’)]/text()").extract_first()
                          ^                            ^

按照第二个链接中的建议将整个XPath表达式转换为unicode，并修复上面指出的两个引号应该可以修复初始错误.下面是使用lxml的快速测试(引擎盖下容易被刮擦):

Converting the entire XPath expression to unicode, as suggested in the 2nd link, and fixing the two quotes pointed above should fix the initial errors. Below as a quick test using lxml (which scrapy uses under the hood) :

>>> from lxml import etree
>>> root = etree.fromstring('<root/>')
>>> root.xpath(u".//div[1][contains(., 'Baños')]/text()")
[]

这篇关于如何在Scrappy中的Xpath中添加非ASCII字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Scrappy中的Xpath中添加非ASCII字符 [英] how to add non-ascii characters in Xpath, in Scrappy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Scrappy中的Xpath中添加非ASCII字符 [英] how to add non-ascii characters in Xpath, in Scrappy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭