BeautifulSoup-仅返回第一张桌子 [英] BeautifulSoup - only returning first table

查看:85
本文介绍了BeautifulSoup-仅返回第一张桌子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近一直在与BeautifulSoup合作.我正在尝试从 https://www.pro获取数据-football-reference.com/teams/mia/2000_roster.htm 网站.具体来说,我想要的只是玩家名称和"gs"(游戏开始).

I've been working with BeautifulSoup lately. I'm trying to get the data from https://www.pro-football-reference.com/teams/mia/2000_roster.htm site. Specifically all I want is the player name and 'gs' (games started).

但是,这样做时,它仅返回第一个(启动器")表数据.我实际上对那个最高的桌子完全不感兴趣,我想要第二个标题为"Roster"的桌子.

However, when doing it, it's only returning the 1st ('Starters') table data. I'm actually not interested in that top table at all, I want the 2nd table titled 'Roster'.

这是我正在执行的代码.就像我说的那样,除了球员姓名和比赛开始之外,我真的不需要/需要其他任何东西,而只是练习和学习BeautifulSoup.

Here's the code, that I was doing. Like I said, I didn't really want/need anything other than player name and games started, but was just practicing and learning BeautifulSoup.

import pandas as pd
import requests
import bs4

alpha  = requests.get('https://www.pro-football-
reference.com/teams/mia/2000_roster.htm')

beta = bs4.BeautifulSoup(alpha.text,'lxml')


gama = beta.findAll('th',{'data-stat':'pos'})
position = [th.text for th in gama]
position = position[1:]
position = list(filter(None, position))

gama = beta.findAll('td',{'data-stat':'player'})
player = [td.text for td in gama]
player = player[1:]
while 'Defensive Starters' in player: player.remove('Defensive Starters')
while 'Special Teams Starters' in player: player.remove('Special Teams 
Starters')

gama = beta.findAll('td',{'data-stat':'age'})
age = [td.text for td in gama]
age = list(filter(None, age))

gama = beta.findAll('td',{'data-stat':'gs'})
gs = [td.text for td in gama]
gs = list(filter(None, gs))

target = pd.DataFrame(

{
'player_name':player,
'position':position,
'gs':gs,
'age':age
})

有人看到我要去哪里了吗?或者也许是另一种解决方法?

Anyone see where I'm going wrong? Or maybe an alternative way to go about it?

推荐答案

要从该表中获取内容,您需要使用任何浏览器模拟器,因为该部分的响应是动态生成的.但是,无需任何浏览器模拟器,就可以轻松访问第一个表中的数据.在这种情况下,我尝试了硒:

To get the content from that table you need to use any browser simulator cause the response of that portion is generated dynamically. Data from the first table can easily be accessible without any browser simulator, though. I tried selenium in this case:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
    player = items.select("[data-stat='player']")[0].text
    gs = items.select("[data-stat='gs']")[0].text
    print(player,gs)

driver.quit()

部分输出:

Player  GS
Trace Armstrong* 0
John Bock 1
Tim Bowens 15
Lorenzo Bromell 0
Autry Denson 0
Mark Dixon 15
Kevin Donnalley 16

由于某种原因,如果您遇到此类错误,则这次也不会针对该错误进行此类选择:

For some reason if you encounter such error, this time there will be no such option for that error either:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
    player = items.select("[data-stat='player']")[0].text if items.select("[data-stat='player']") else ""
    gs = items.select("[data-stat='gs']")[0].text if items.select("[data-stat='gs']") else ""
    print(player,gs)

driver.quit()

这篇关于BeautifulSoup-仅返回第一张桌子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆