Scraping的初学者,不断得到空列表 [英] Beginner to Scraping, keep on getting empty lists

查看:27
本文介绍了Scraping的初学者,不断得到空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我决定尝试使用 Python(使用 lxml 和请求)进行网络抓取.我试图抓取学习的网页是:http://www.football-lineups.com/season/Real_Madrid/2013-2014

I've decided to take a swing at web scraping using Python (with lxml and requests). The webpage I'm trying to scrape to learn is: http://www.football-lineups.com/season/Real_Madrid/2013-2014

我要抓取的是网页左侧的表格(包含使用的分数和阵型的表格).这是我正在使用的代码:

What I want to scrape is the table on the left of the webpage (the table with the scores and formations used). Here is the code I'm working with:

from lxml import html
import requests
page=requests.get("http://www.football-lineups.com/season/Real_Madrid/2013-2014")
tree=html.fromstring(page.text)
competition=tree.xpath('//*[@id="sptf"]/table/tbody/tr[2]/td[4]/font/text()')
print competition

我输入的 xpath 是我从 Chrome 复制过来的 xpath.该代码通常应返回表中第一场比赛的比赛(即西甲).换句话说,它应该返回第二行第四列条目(web 布局上有一个随机的第二列,我不知道为什么).但是,当我运行代码时,我得到一个空列表.这段代码可能哪里出错了?

The xpath that I input is the xpath that I copied over from Chrome. The code should normally return the competition of the first match in the table (i.e. La Liga). In other words, it should return the second row, fourth column entry (there is a random second column on the web layout, I don't know why). However, when I run the code, I get back an empty list. Where might this code be going wrong?

推荐答案

如果您检查页面的行源,您将看到阵容表不在那里.它是在使用 AJAX 加载页面后提供的,因此您无法仅通过获取 http://www.football-lineups.com/season/Real_Madrid/2013-2014 因为不会解释 JS,因此不会执行 AJAX.

If you inspect the row source of the page you will see that the lineup table is not there. It is fed after loading the page using AJAX so you wont be able to fetch it only by getting http://www.football-lineups.com/season/Real_Madrid/2013-2014 since the JS won't be interpreted and thus the AJAX not executed.

AJAX 请求如下:

也许你可以伪造请求来获取这些数据.我会让你分析那些命名良好的 dX 参数是什么:)

Maybe you can forge the request to get this data. I'll let you analyse what are those well named dX arguments :)

这篇关于Scraping的初学者,不断得到空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆