AssertionError:传递了22列,传递的数据有21列 [英] AssertionError: 22 columns passed, passed data had 21 columns

查看:70
本文介绍了AssertionError:传递了22列,传递的数据有21列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码:

from urllib import urlopen
from bs4 import BeautifulSoup
import pandas as pd

url = "http://www.basketball-reference.com/draft/NBA_2014.html"
html = urlopen(url)
soup = BeautifulSoup(html)
column_headers = [th.getText() for th in soup.findAll('tr',limit=2)[1].findAll('th')]
data_rows = soup.findAll('tr')[2:]
player_data = [[td.getText() for td in data_rows[i].findAll('td')] for i in range(len(data_rows))] #PLAYER DATA 

type(soup)
type(data_rows)

df = pd.DataFrame(player_data,columns=column_headers)

该错误似乎发生在最后一行.

The error seems to occur in the last line.

推荐答案

首先,错误非常简单:您的column_headers列表有22列,但是player_data条目只有21列.因此,您需要查找缺少的列以及原因.仅通过直观地比较数据框和标题列表中的条目,就可以看出缺少第一两列之一. player_data[0][0]返回

First of all, the error is pretty straight-forward: your column_headers list has 22 columns, but player_data entries only have 21. So you need to find which out column is missing and why. Just by visually comparing the entries from the dataframe and the headers list, it appears one of the two first columns is missing. player_data[0][0] returns

1, CLE, Andrew Wiggins, University of Kansas,...,但应该是

1, 1, CLE, Andrew Wiggins, University of Kansas,...

问题在于表本身.导航到网站,将鼠标悬停在表格上,然后右键单击:检查.

The problem is the table itself. Navigate to the website, hover over the table and right-click: inspect.

第一行数据(在"Rk"下)由21个td和1个th元素组成. "rk"条目实际上是th类型,而不是td类型:

The first row of data (underneath the 'Rk') consists of 21 td and 1 th element. The "rk" entry is actually of type th and not td:

这就是为什么

player_data = [[td.getText() for td in data_rows[i].findAll('td')] for i in range(len(data_rows))] 

跳过第一列,因为它仅遍历td元素.因此长度不同. 我不知道第一列有多重要.快速解决方案是从标题列表中删除"Rk"列.

skips the first column because it is only iterating over td elements. Hence the different length. I don't know how important the first column is; quick fix would be to drop the Rk column from your headers list.

或者,同时搜索 tdth元素:

Alternatively, search for both td and th elements:

player_data = [[td.getText() for td in data_rows[i].findAll(['td','th'])] for i in range(len(data_rows))]

这篇关于AssertionError:传递了22列,传递的数据有21列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆