为什么这个循环中的Pandas DataFrame的列不起作用? [英] Why does a column from pandas DataFrame not work in this loop?

查看:727
本文介绍了为什么这个循环中的Pandas DataFrame的列不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,我从篮球参考与球员名称。下面的代码是如何构建DataFrame。它有5列球员名称,但每个名字也有球员的位置。

  url =http:// www。 basketball-reference.com/awards/all_league.html
dframe_list = pd.io.html.read_html(url)
df = dframe_list [0]
df.drop(df.columns [ [0,1,2]],inplace = True,axis = 1)
column_names = ['name1','name2','name3','name4','name5']
df。 columns = column_names
df = df [df.name1.notnull()]

我是试图分裂位置。为此,我计划为每个名称列创建一个DataFrame:

  name1 = pd.DataFrame(df.name1.str .split()。tolist())。ix [:,0:1] 
name1 [0] = name1 [0] ++ name1 [1]
name1.drop(name1.columns [[1]],inplace = True,axis = 1)

由于我有五列我认为我会这样做一个循环

  column_names = ['name1','name2','name3','name4' 'name5'] 
column_names中的列:
column = pd.DataFrame(df.column.str.split()。tolist())。ix [:,0:1]
列[0] =列[0] ++列[1]
column.drop(column.columns [[1]],inplace = True,axis = 1)
column.columns =列

然后我将所有这些DataFrames加入到一起。

  df_NBA = [name1,name2,name3,name4,name5] 
df_NBA = pd.concat(df_NBA,axis = 1)

我是新来的python,所以我确定我在做这个一个非常繁琐的时尚,并会喜欢关于我如何可以做得更快的建议。但是我的主要问题是,当我在各个列上运行代码时,它的工作正常,但是如果我运行循环,我会得到错误:

  AttributeError:'DataFrame'对象没有属性'列'

的循环 df.column.str 正在引起一些问题?我已经在列表中填充了列表(我仍然不明白为什么有时候我支持一个DataFrame列,有时候它是.column,但这是一个更大的问题)和其他随机的东西。



当我尝试@ BrenBarn的建议

  df.apply(lambda c:c.str [ -2])

以下在Jupyter笔记本中弹出:

  SettingWithCopyWarning:
尝试从DataFrame中的一个切片的副本设置一个值

请参阅文档:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
如果__name__ =='__main__':
解决方案

如果位置标签总是只有一个字符,简单的解决方案是这样的:

 >>> df.apply(lambda c:c.str [: -  2])
name1 name2
0 Marc Gasol Lebron James
1保罗加索尔凯文杜兰特
2德怀特霍华德凯里欧文

系列的 str 属性可让您字符串操作,包括索引,所以这只是修剪每个值的最后两个字符。



关于你的问题关于 df.column ,这个问题比熊猫更普遍。这两件事情是不一样的:

 #works 
obj.attr

#不工作
attrName ='attr'
obj.attrName

你当您想访问其名称存储在变量中的属性时,不能使用点符号。通常,您可以使用 getattr 函数。然而,熊猫通过将名称指定为字符串(而不是源代码标识符)来提供访问列的括号符号。所以这两个是等价的:

  df.some_column 

columnName =some_column
df [columnName]

在您的示例中,将引用更改为 df.column df [column] 应该解决这个问题。但是,正如我在评论中提到的,您的代码也有其他问题。就解决手头的任务而言,我在回答开始时所显示的字符串索引方法要简单得多。


I have a DataFrame that I took from basketball-reference with player names. The code below is how I built the DataFrame. It has 5 columns of player names, but each name also has the player's position.

url = "http://www.basketball-reference.com/awards/all_league.html"
dframe_list = pd.io.html.read_html(url)
df = dframe_list[0]
df.drop(df.columns[[0,1,2]], inplace=True, axis=1)
column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
df.columns = column_names
df = df[df.name1.notnull()]

I am trying to split off the position. To do so I had planned to make a DataFrame for each name column:

name1 = pd.DataFrame(df.name1.str.split().tolist()).ix[:,0:1]
name1[0] = name1[0] + " " + name1[1]
name1.drop(name1.columns[[1]], inplace=True, axis=1)

Since I have five columns I thought I would do this with a loop

column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
for column in column_names:
    column = pd.DataFrame(df.column.str.split().tolist()).ix[:,0:1]
    column[0] = column[0] + " " + column[1]
    column.drop(column.columns[[1]], inplace=True, axis=1)
    column.columns = column

And then I'd join all these DataFrames back together.

df_NBA = [name1, name2, name3, name4, name5]
df_NBA = pd.concat(df_NBA, axis=1)

I'm new to python, so I'm sure I'm doing this in a pretty cumbersome fashion and would love suggestions as to how I might do this faster. But my main question is, when I run the code on individual columns it works fine, but if when I run the loop I get the error:

AttributeError: 'DataFrame' object has no attribute 'column'

It seems that the part of the loop df.column.str is causing some problem? I've fiddled around with the list, with bracketing column (I still don't understand why sometimes I bracket a DataFrame column and sometimes it's .column, but that's a bigger issue) and other random things.

When I try @BrenBarn's suggestion

df.apply(lambda c: c.str[:-2])

The following pops up in the Jupyter notebook:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation:    http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

Looking at the DataFrame, nothing has actually changed and if I understand the documentation correctly this method creates a copy of the DataFrame with the edits, but that this is a temporary copy that get's thrown out afterward so the actual DataFrame doesn't change.

解决方案

If the position labels are always only one character, the simple solution is this:

>>> df.apply(lambda c: c.str[:-2])
           name1         name2
0     Marc Gasol  Lebron James
1      Pau Gasol  Kevin Durant
2  Dwight Howard  Kyrie Irving

The str attribute of a Series lets you do string operations, including indexing, so this just trims the last two characters off each value.

As for your question about df.column, this issue is more general than pandas. These two things are not the same:

# works
obj.attr

# doesn't work
attrName = 'attr'
obj.attrName

You can't use the dot notation when you want to access an attribute whose name is stored in a variable. In general, you can use the getattr function instead. However, pandas provides the bracket notation for accessing a column by specifying the name as a string (rather than a source-code identifier). So these two are equivalent:

df.some_column

columnName = "some_column"
df[columnName]

In your example, changing your reference to df.column to df[column] should resolve that issue. However, as I mentioned in a comment, your code has other problems too. As far as solving the task at hand, the string-indexing approach I showed at the beginning of my answer is much simpler.

这篇关于为什么这个循环中的Pandas DataFrame的列不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆