刮错表 [英] Scraping wrong table
问题描述
我正在尝试将球员的高级统计数据放到 Excel 表格中,但它抓取的表格是第一个表格,而不是高级统计表格.
I'm trying to get the advanced stats of players onto an excel sheet but the table it's scraping is the first one instead of the advanced stats table.
ValueError: Length of passed values is 23, index implies 21
如果我尝试使用 id,我会收到另一个关于 tbody 的错误.
If i try to use the id instead, i get an another error about tbody.
另外,我收到一个关于
lname=name.split(" ")[1]
IndexError: list index out of range.
我认为这与列表中的Nene"有关.有办法解决吗?
I think that has to do with 'Nene' in the list. Is there a way to fix that?
import requests
from bs4 import BeautifulSoup
playernames=['Carlos Delfino',
'Yao Ming',
'Andris Biedrins',
'Nene']
for name in playernames:
fname=name.split(" ")[0]
lname=name.split(" ")[1]
url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table', attrs={'class': 'tablesaw', 'data-tablesaw-mode-exclude': 'columntoggle'}).find_next('tbody')
print(table)
columns = ['Season', 'Team', 'League', 'GP', 'GS', 'TS%', 'eFG%', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'TOV%', 'STL%', 'BLK%', 'USG%', 'Total S%', 'PPR', 'PPS', 'ORtg', 'DRtg', 'PER']
df = pd.DataFrame(columns=columns)
trs = table.find_all('tr')
for tr in trs:
tds = tr.find_all('td')
row = [td.text.replace('\n', '') for td in tds]
df = df.append(pd.Series(row, index=columns), ignore_index=True)
df.to_csv('international players.csv', index=False)
推荐答案
巴西人只为足球使用一个名字,比如 Fred.如果你想使用他们的绰号 (Nene/Fred) 那么你需要为此实现异常处理,比如
Brazilians only use one name for soccer think Fred. If you want to use their moniker (Nene/Fred) then you need to implement exception handling for this, something like
try:
lname=name.split(" ")[1]
except IndexError:
lname=name
对于您的抓取问题,请尝试使用 find_all
而不是 find
,这将为您提供给定页面上的每个数据表,然后您可以拉出正确的表名单
For your scraping issue, try using find_all
as opposed to find
, this will give you every data table on a given page and then you can pull the correct table out of the list
更改table = soup.find('table', attrs={'class': 'tablesaw', 'data-tablesaw-mode-exclude': 'columntoggle'}, {'id':'table-3554'})
到 find_all
另外,表格 ID 每次刷新页面时都会更改,因此您不能将 ID 用作搜索机制.
FYI also, the table ID's change every time you refresh the page so you can't use ID as a search mechanism.
这篇关于刮错表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!