Web爬网具有多个表的页面 [英] Web Scraping a page with multiple tables

查看:50
本文介绍了Web爬网具有多个表的页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从该网站上抓取第二张表: https://fbref.com/en/comps/9/stats/超级联赛统计数据但是,在尝试通过查找表标签访问信息时,我只设法从第一个表中提取了信息.任何人都可以向我解释为什么我无法访问第二个表或向我展示如何操作.

I am trying to web scrape the second table from this website: https://fbref.com/en/comps/9/stats/Premier-League-Stats However, I have only ever managed to extract the information from the first table when trying to access the information by finding the table tag. Would anyone be able to explain to me why I cannot access the second table or show me how to do it.

import requests 
from bs4 import BeautifulSoup
url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
pl_table = soup.find_all("table")  
player_table = tables[0]

推荐答案

该表位于HTML注释内<!-...-> .

The table is inside HTML comments <!-- ... -->.

要从注释中获取表格,可以使用以下示例:

To get the table from comments, you can use this example:

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

打印:

Patrick van Aanholt           Crystal Palace      1990      
Max Aarons                    Norwich City        2000      
Tammy Abraham                 Chelsea             1997      
Che Adams                     Southampton         1996      
Adrián                        Liverpool           1987      
Sergio Agüero                 Manchester City     1988      
Albian Ajeti                  West Ham            1997      
Nathan Aké                    Bournemouth         1995      
Marc Albrighton               Leicester City      1989      
Toby Alderweireld             Tottenham           1989      

...and so on.

这篇关于Web爬网具有多个表的页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆