Scrapy 中的嵌套选择器 [英] Nested Selectors in Scrapy

查看：29 发布时间：2021/7/16 21:55:31 python web-crawler scrapy

本文介绍了Scrapy 中的嵌套选择器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法按照 Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html)

I have trouble getting nested Selectors to work as described in the documentation of Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html)

这是我得到的:

sel = Selector(response)
level3fields = sel.xpath('//ul/something/*')

for element in level3fields:
    site = element.xpath('/span').extract()

当我在循环中打印出元素"时，我得到 <选择器 xpath='stuff seen above' data="u'text</span>>

When I print out "element" in the loop I get < Selector xpath='stuff seen above' data="u'< span class="something">text< /span>>

现在我遇到了两个问题:

Now I got two problems:

首先，在元素中，还应该有一个a"节点(如<a href)，但它没有出现在打印输出中，只有当我直接提取它时，它才会显示出来.这只是打印错误还是元素选择器"没有保存节点(不提取)

Firstly, within the element, there should also be an "a"-node (as in <a href), but it doesn't show up in the print out, only if I extract it directly, then it does show up. Is that just a printing error or doesn't the "element-Selector" hold the a-node (without extraction)

当我打印出上面的站点"时，它应该显示一个包含跨度节点的列表.但是，它不会，它只会打印出一个空列表.

when I print out "site" above, it should show a list with the span-nodes. However, it doesn't, it only prints out an empty list.

我尝试了多种更改的组合(在不同的地方有多个或没有斜线和星号 (*))，但没有一个让我更接近.

I tried a combination of changes (multiple to no slashes and stars (*) in different places), but none of it brought me any closer.

本质上，我只想得到一个嵌套的 Selector，它在第二步(循环)中为我提供 span 节点.

Essentially, I just want to get a nested Selector which gives me the span-node in the second step (the loop).

有人有任何提示吗?

推荐答案

关于你的第一个问题，这只是一个打印错误".选择器上的 __repr__ 和 __str__ 方法只打印数据的前 40 个字符(元素表示为 HTML/XML 或文本内容).请参阅 https://github.com/scrapy/scrapy/blob/master/scrapy/selector/unified.py#L143

Regarding your first question, it's just a print "error". __repr__ and __str__ methods on Selectors only print the first 40 characters of the data (element represented as HTML/XML or text content). See https://github.com/scrapy/scrapy/blob/master/scrapy/selector/unified.py#L143

在 level3fields 上的循环中，您应该使用相对 XPath 表达式.使用 /span 将直接在根节点下查找 span 元素，我猜这不是你想要的.

In your loop on level3fields you should use relative XPath expressions. Using /span will look for span elements directly under the root node, that's not what you want I guess.

试试这个:

sel = Selector(response)
level3fields = sel.xpath('//ul/something')

for element in level3fields:
    site = element.xpath('.//span').extract()

这篇关于Scrapy 中的嵌套选择器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy 中的嵌套选择器 [英] Nested Selectors in Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy 中的嵌套选择器 [英] Nested Selectors in Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭