刮错表 [英] Scraping wrong table

查看:62
本文介绍了刮错表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将球员的高级统计数据放到 Excel 表格中,但它抓取的表格是第一个表格,而不是高级统计表格.

I'm trying to get the advanced stats of players onto an excel sheet but the table it's scraping is the first one instead of the advanced stats table.

ValueError: Length of passed values is 23, index implies 21

如果我尝试使用 id,我会收到另一个关于 tbody 的错误.

If i try to use the id instead, i get an another error about tbody.

另外,我收到一个关于

lname=name.split(" ")[1]
IndexError: list index out of range. 

我认为这与列表中的Nene"有关.有办法解决吗?

I think that has to do with 'Nene' in the list. Is there a way to fix that?

import requests
from bs4 import BeautifulSoup
playernames=['Carlos Delfino',
'Yao Ming',
'Andris Biedrins',
'Nene']

for name in playernames:
  fname=name.split(" ")[0]
  lname=name.split(" ")[1]
  url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)
  response = requests.get(url)

  soup = BeautifulSoup(response.content, 'html.parser')
  table = soup.find('table', attrs={'class': 'tablesaw', 'data-tablesaw-mode-exclude': 'columntoggle'}).find_next('tbody')
  print(table)  

  columns = ['Season', 'Team', 'League', 'GP', 'GS', 'TS%', 'eFG%', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'TOV%', 'STL%', 'BLK%', 'USG%', 'Total S%', 'PPR', 'PPS', 'ORtg', 'DRtg', 'PER']
  df = pd.DataFrame(columns=columns)

  trs = table.find_all('tr')
  for tr in trs:
    tds = tr.find_all('td')
    row = [td.text.replace('\n', '') for td in tds]
    df = df.append(pd.Series(row, index=columns), ignore_index=True)

df.to_csv('international players.csv', index=False) 

推荐答案

巴西人只为足球使用一个名字,比如 Fred.如果你想使用他们的绰号 (Nene/Fred) 那么你需要为此实现异常处理,比如

Brazilians only use one name for soccer think Fred. If you want to use their moniker (Nene/Fred) then you need to implement exception handling for this, something like

try:
    lname=name.split(" ")[1]
except IndexError:
    lname=name

对于您的抓取问题,请尝试使用 find_all 而不是 find,这将为您提供给定页面上的每个数据表,然后您可以拉出正确的表名单

For your scraping issue, try using find_all as opposed to find, this will give you every data table on a given page and then you can pull the correct table out of the list

更改table = soup.find('table', attrs={'class': 'tablesaw', 'data-tablesaw-mode-exclude': 'columntoggle'}, {'id':'table-3554'})find_all

另外,表格 ID 每次刷新页面时都会更改,因此您不能将 ID 用作搜索机制.

FYI also, the table ID's change every time you refresh the page so you can't use ID as a search mechanism.

这篇关于刮错表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆