为什么scrapy Xpath 找不到我的浏览器Xpath 找到的内容? [英] Why scrapy Xpath can not find what is found by my browser(s) Xpath?

查看:63
本文介绍了为什么scrapy Xpath 找不到我的浏览器Xpath 找到的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过 Xpath 在页面(Scrapy 的第一个项目)中找到一些东西,例如页面

但是 在 Scrapy 1.6 Xpath 中,当我想得到它的结果时,它没有找到任何东西,只是返回一个空列表

 def parse(self, response):print(response.xpath('''//div[@class='file js-comment-container js-resolvable-timeline-thread-container has-inline-notes']'''))

结果就是[].

您认为问题出在哪里?我该如何解决?提前致谢.

注意:是的,我知道 robot.text 甚至 ROBOTSTXT_OBEY = False

解决方案

这些类中的一些似乎是由 javascript 添加的.
但是,如果您能够找到合适的选择器,即使没有执行 javascript,您仍然可以选择您尝试定位的 div:

<预><代码>>>>获取('https://github.com/rg3/youtube-dl/pull/11272')2019-02-09 14:50:19 [scrapy.core.engine] 调试:爬行(200)<GET https://github.com/rg3/youtube-dl/pull/11272>(参考r:无)>>>response.css('div.file')[<Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ',normalize-space(@class), ' '), '文件')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), '文件')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selectorxpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file')]" data='<div class="file js-comment-container js'>]>>>连(_)9

I want to find something by Xpath in a page (first project by Scrapy), for example the page https://github.com/rg3/youtube-dl/pull/11272.

In both my Opera inspect and firefox TryXpath add-on, this Xpath expression has the same result:

//div[@class='file js-comment-container js-resolvable-timeline-thread-container has-inline-notes']

and it is like this:

BUT in Scrapy 1.6 Xpath, when I want to get its result, it dose not find any thing and just return an empty list

 def parse(self, response):
    print(response.xpath('''//div[@class='file js-comment-container js-resolvable-timeline-thread-container has-inline-notes']'''))

and the result is just [].

What do you think is the problem? and how can I fix it? thanks in advance.

NOTE: yes I know about robot.text and even ROBOTSTXT_OBEY = False

解决方案

It would seem that some of those classes are being added by javascript.
However, if you're able to find a suitable selector, you're still able to select the divs you're trying to target, even if the javascript is not executed:

>>> fetch('https://github.com/rg3/youtube-dl/pull/11272')
2019-02-09 14:50:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://github.com/rg3/youtube-dl/pull/11272> (refere
r: None)
>>> response.css('div.file')
[<Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" dat
a='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ',
normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant
-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comme
nt-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '
), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and con
tains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector
 xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div cl
ass="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-
space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::
div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-contain
er js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file
')]" data='<div class="file js-comment-container js'>]
>>> len(_)
9

这篇关于为什么scrapy Xpath 找不到我的浏览器Xpath 找到的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆