Scrapy hxs.select() 不选择所有结果 [英] Scrapy hxs.select() not selecting all results

查看：43 发布时间：2021/7/16 22:04:01 python web-scraping scrapy

本文介绍了Scrapy hxs.select() 不选择所有结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从这里抓取赔率.

I am trying to scrapy to scrape odds from here.

目前只是尝试使用以下蜘蛛记录结果:

Currently just trying to log the results with the following spider :

def parse(self, response):         
   log.start("LogFile.txt", log.DEBUG);

   hxs = HtmlXPathSelector(response)
   sites = hxs.select('//div[@class="fb_day_type_wrapper"]')

   items = []
   for site in sites:
       siteAddress = urlparse.urljoin(response.url, site.extract())
       self.log('Found category url: %s' % siteAddress)

这仅记录条目:此市场当前不可用....不是包含赔率的其他元素.

This only logs the entry: This market is currently unavailable.... Not the other elements which contain the odds.

我尝试了几个不同的选择器，但都没有成功.看起来一旦我尝试进入元素 div[@class="fb_day_type_wrapper"] 我什么也没有返回.我使用scrapy shell也有同样的结果.

I have tried a few different selectors with no luck. It looks like once I try and get inside of the element div[@class="fb_day_type_wrapper"] I get nothing returned. I have the same results using the scrapy shell.

推荐答案

该站点使用 javascript 生成数据表.有一些替代方案，例如 scrapyjs 或 splash 允许获取 js 渲染的 html 页面.如果您只需要抓取一页，则最好使用 Selenium.

The site uses javascript to generate the data table. There are some alternatives like scrapyjs or splash that allows to get the js-rendered html page. If you only need to scrape one page, you might be better off using Selenium.

否则，您可能需要进入核心模式并使用数据对站点中发生的事情进行逆向工程.我会告诉你怎么做.

Otherwise, you might need to go into hardcore mode and reverse engineer what is going on in the site with the data. I will show you how to do that.

首先，启动 scrapy shell 以便我们可以浏览网页:

First, start the scrapy shell so we can explore the web page:

scrapy shell http://www.paddypower.com/football/football-matches/premier-league

注意:我使用的是 python 2.7.4、ipython 0.13.2 和 scrapy 0.18.0.

Note: I'm using python 2.7.4, ipython 0.13.2 and scrapy 0.18.0.

如果您在浏览器中查找Crystal Palace v Fulham"的源代码，您将看到包含该引用的 javascript 代码.


        
            相关文章
            
                    
                        
                            Scrapy 部署与调试结果不匹配;
                        
                    
                    
                        
                            scrapy csv输出所有结果在单行;
                        
                    
                    
                        
                            gocql SELECT *不返回所有列;
                        
                    
                    
                        
                            如何显示linq select的所有结果;
                        
                    
                    
                        
                            微软HXS文件格式;
                        
                    
                    
                        
                            无法从select2搜索结果中选择结果;
                        
                    
                    
                        
                            Scrapy - 不爬行;
                        
                    
                    
                        
                            JTextArea.select()不选择任何内容;
                        
                    
                    
                        
                            Mysql“select * from”不返回所有行;
                        
                    
                    
                        
                            Mysql“select * from"不返回所有行;
                        
                    
                    
                        
                            Scrapy结果正在重复;
                        
                    
                    
                        
                            SQL Server PRINT SELECT(打印选择查询结果)?;
                        
                    
                    
                        
                            选择后如何保留Select2结果?;
                        
                    
                    
                        
                            Select2 Ajax不基于查询过滤结果;
                        
                    
                    
                        
                            Scrapy - 获取选择器中的所有数据;
                        
                    
                    
                        
                            Angular Material:mat-select 不选择默认;
                        
                    
                    
                        
                            Scrapy蜘蛛不工作;
                        
                    
                    
                        
                            jQuery Select2-如何选择所有选项;
                        
                    
                    
                        
                            scrapy 允许所有域;
                        
                    
                    
                        
                            React-select isMulti 选择所有过滤选项;
                        
                    
                    
                        
                            scrapy:“加载更多结果"页;
                        
                    
                    
                        
                            MySQL通过password()选择不返回预期结果;
                        
                    
                    
                        
                            Tweepy(Twitter API)不返回所有搜索结果;
                        
                    
                    
                        
                            Scrapy css 选择器:获取所有内部标签的文本;
                        
                    
                    
                        
                            scrapy css选择器：获取所有内部标签的文本;


    
        
            Python最新文章
            
                    
                        
                            类型错误：只有长度为1的阵列可以尝试拟合指数的数据转换到Python标量;
                        
                    
                    
                        
                            bs4.FeatureNotFound：找不到一棵树建设者您所要求的功能：LXML。你需要安装一个解析器库？;
                        
                    
                    
                        
                            系列的真值是不明确的。使用a.empty，a.bool（），a.item（），a.any（）或a.all（）;
                        
                    
                    
                        
                            （unicode错误）'unicodeescape'编解码器无法解码位置2-3中的字节：truncated \UXXXXXXXX escape;
                        
                    
                    
                        
                            将pandas dataframe中的列从int转换为string;
                        
                    
                    
                        
                            Python：由实例对象调用方法：“missing 1 required positional argument：'self'”;
                        
                    
                    
                        
                            Sparksql过滤与多个条件（与where子句中选择）;
                        
                    
                    
                        
                            JSONDe codeError：期待值：1行1列（CHAR 0）;
                        
                    
                    
                        
                            Cmake不能找到Python库;
                        
                    
                    
                        
                            Python  - 将Dataframe中的所有项目转换为字符串;
                        
                    
            
        
        
            
                热门教程
            
            
                
                    
                        Java教程
                    
                
                
                    
                        Apache ANT 教程
                    
                
                
                    
                        Kali Linux教程
                    
                
                
                    
                        JavaScript教程
                    
                
                
                    
                        JavaFx教程
                    
                
                
                    
                        MFC 教程
                    
                
                
                    
                        Apache HTTP客户端教程
                    
                
                
                    
                        Microsoft Visio 教程
                    
                
            
        
        
            
                热门工具
            
            
                
                
                    
                        Java 在线工具
                    
                
                
                    
                        C(GCC) 在线工具
                    
                
                
                    
                        PHP 在线工具
                    
                
                
                    
                        C# 在线工具
                    
                
                
                    
                        Python 在线工具
                    
                
                
                    
                        MySQL 在线工具
                    
                
                
                    
                        VB.NET 在线工具
                    
                
                
                    
                        Lua 在线工具
                    
                
                
                    
                        Oracle 在线工具
                    
                
                
                    
                        C++(GCC) 在线工具
                    
                
                
                    
                        Go 在线工具
                    
                
                
                    
                        Fortran 在线工具

Scrapy hxs.select() 不选择所有结果 [英] Scrapy hxs.select() not selecting all results

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭