scrapy xpath选择器在h1-h6标签上的行为 [英] Behavior of the scrapy xpath selector on h1-h6 tags

查看：286 发布时间：2020/11/24 1:50:18 python html xpath scrapy selector

本文介绍了scrapy xpath选择器在h1-h6标签上的行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下两个代码段为什么给出不同的输出?它们之间的唯一区别是，在第二种情况下，第一种情况下的h1标记被替换为h标记.这是因为h1标记在html中具有特殊的含义"吗?我尝试使用h1到h6，所有这些都将[]作为输出，而对于h7，它开始将[u'xxx']作为输出.

Why does the following two code snippets give different outputs? The only difference between them is that the h1 tag in the first case is replaced with an h tag in the second case. Is this because the h1 tag has a special "meaning" in html? I tried with h1 through h6 and all of them give [] as output, while with h7 it starts to give [u'xxx'] as output.

from scrapy import Selector # scrapy version: 1.2.2

text = '<h1><p>xxx</p></h1>'
print Selector(text=text).xpath('//h1/p/text()').extract()
Output[1]: []

text = '<h><p>xxx</p></h>'
print Selector(text=text).xpath('//h/p/text()').extract()
Output[2]: [u'xxx']

推荐答案

根据W3C，在h#中包含p标记无效.您可以查看有关此

Including p tags inside h# is invalid according to W3C. You can see more about this here

无论如何，要绕过此方法并仅使用任何xml结构，您可以像这样更改type:

Anyway, to bypass this and just work with any xml structure you can just change the type like this:

sel = Selector(text="anyxml", type="xml")

这将尊重任何xml结构.

This will respect any xml structure.

这篇关于scrapy xpath选择器在h1-h6标签上的行为的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

scrapy xpath选择器在h1-h6标签上的行为 [英] Behavior of the scrapy xpath selector on h1-h6 tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

scrapy xpath选择器在h1-h6标签上的行为 [英] Behavior of the scrapy xpath selector on h1-h6 tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭