Scrapy找不到表格css [英] Scrapy not finding table css

查看:45
本文介绍了Scrapy找不到表格css的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近刚开始使用 Scrapy,到目前为止我一直很幸运,直到出现这个问题.我似乎无法在这里找到"积分榜;

http://www.baseball-reference.com/leagues/MLB/2016-standings.shtml#all_expanded_standings_overall

它的 id = '#expanded_standings_overall' 但我在我的蜘蛛或外壳中找不到它.我能够获得 #all_expanded_standings_overall 的结果,因为有一个带有该 ID 的 div.在 shell 中提取它向我展示了我想要的表,但即使在其中我也无法使用 'tbody' 或 'tr' 或我尝试过的任何其他内容找到它.

解决方案

如果您查看页面源代码,您会看到有问题的 id (expanded_standings_overall)

<!--<div class="table_outer_container"><div class="overthrow table_container" id="div_expanded_standings_overall"><table class="sortable stats_table" id="expanded_standings_overall" data-cols-to-freeze=2><caption>美国职业棒球大联盟详细排名</caption>......这里的甜蜜数据..

-->

HTML 注释似乎是一种将内容隐藏到我们无辜的抓取工具中的技巧;)

有趣的是 Firebug 不显示此评论......?

解决该问题的一种方法是提取评论、删除评论并继续处理评论中的数据.例如:

$ scrapy shell www.baseball-reference.com/leagues/MLB/2016-standings.shtml>>>查看(响应)>>>from scrapy.selector import Selector>>>sel = 选择器(响应)>>>sel.xpath('//table[@id="expanded_standings_overall"]')[]>>>进口重新>>>regex = re.compile(r'', re.DOTALL)>>>对于 sel.xpath('//comment()').re(regex) 中的评论:>>>table = Selector(text=comment).xpath('//table[@id="expanded_standings_overall"]')>>>打印(表)...[][][<Selector xpath='//table[@id="expanded_standings_overall"]' data='<table class="sortable stats_table" id="'>][][]

如您所见,我更喜欢 XPATH 选择器而不是 CSS,但它们在原则上是相同的,请参阅 https://doc.scrapy.org/en/latest/topics/selectors.html.

Just started using Scrapy recently and I've been having good luck with it so far until this issue. I can't seem to 'find' the standings table here;

http://www.baseball-reference.com/leagues/MLB/2016-standings.shtml#all_expanded_standings_overall

It has the id = '#expanded_standings_overall' but I can't find it with my spider or in the shell. I was able to get a result for #all_expanded_standings_overall because there is a div with that ID. Extracting this in the shell shows me the table I want but even within that I can not find it with 'tbody' or 'tr' or anything else I've tried.

解决方案

If you have a look on the page source, you see that the id in question (expanded_standings_overall)

<div class="placeholder"></div>
<!--
    <div class="table_outer_container">
        <div class="overthrow table_container" id="div_expanded_standings_overall">
            <table class="sortable stats_table" id="expanded_standings_overall" data-cols-to-freeze=2>
                <caption>MLB Detailed Standings</caption>
                    ... sweet data here ..
                </table>
        </div>
    </div>
-->
</div>

The HTML comments seems to be a trick to hide the content to our innocent scraper ;)

It is interesting that Firebug don't show this comments ...?

One approach to overcome the issue is to extract the comments, remove them and proceed with the data in the comments. For instance:

$ scrapy shell www.baseball-reference.com/leagues/MLB/2016-standings.shtml
>>> view(response)
>>> from scrapy.selector import Selector
>>> sel = Selector(response)
>>> sel.xpath('//table[@id="expanded_standings_overall"]')
[]
>>> import re
>>> regex = re.compile(r'<!--(.*)-->', re.DOTALL)
>>> for comment in sel.xpath('//comment()').re(regex):
>>>     table = Selector(text=comment).xpath('//table[@id="expanded_standings_overall"]')
>>>     print(table)
...
[]
[]
[<Selector xpath='//table[@id="expanded_standings_overall"]' data='<table class="sortable stats_table" id="'>]
[]
[]

As you see I prefer XPATH selectors over CSS, but they are in principle the same, see https://doc.scrapy.org/en/latest/topics/selectors.html.

这篇关于Scrapy找不到表格css的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
前端开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆