在 pandas 中将lambda函数应用于列失败 [英] Applying a lambda function to a column got failed in pandas

查看:56
本文介绍了在 pandas 中将lambda函数应用于列失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道为什么索引方法在执行按列应用功能时会出现不一致的行为.

I don't know why the index method has inconsistent behavior while doing column-wise apply function.

数据框为:

df = pd.DataFrame( [(1, 'Hello'), (2, "World")])
df.columns=['A', 'B']

我想将lambda应用于第二列,它表示不能应用Series对象?

And I want to apply lambda to the second columns, it it saying the Series object can not be apply?

print df.iloc[:, 1:2].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')
print df.loc[:, ['B']].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')

但是,以下索引方法效果很好.

But rather the following indexing method works well.

print df.loc[:, 'B'].apply(lambda x: x.upper()).head()

为什么?我认为这三个索引方法是等效的? 如果打印出上述三种索引方法,其结果几乎相同 即:

Why? I think the three index methods are equivalent? All above three indexing method has almostly the same result if print out that is:

   B
0  Hello
1  World

并打印df.loc [:,'B']获取

and print df.loc[:, 'B'] gets

0  Hello
1  World
Name: B, dtype: object

差异是什么意思?

推荐答案

使用'B'进行索引时,会得到一个序列.当您使用1:2['B']进行索引时,您将获得一个带有一列的DataFrame.在系列上使用apply时,将在每个元素上调用函数.在DataFrame上使用apply时,将在每个上调用您的函数.

When you index with 'B' you get a series. When you index with 1:2 or with ['B'], you get a DataFrame with one column. When you use apply on a series, your function is called on each element. When you use apply on a DataFrame, your function is called on each column.

所以不,它们不是等效的.拥有系列时,您可以根据需要使用功能.当您拥有一个单列DataFrame时,您将无法执行此操作,因为它已将列作为参数传递给该列,并且该列是一个没有upper方法的Series.

So no, they aren't equivalent. When you have a Series you can use your function as you want. When you have a one-column DataFrame, you can't, because it gets passed the column as its argument, and the column is a Series that doesn't have an upper method.

您可以看到它们是不一样的,因为打印出来的结果是不同的.是的,它们几乎相同,但不相同.第一个有一个列标题,指示它是一个DataFrame;第二个没有列标题,但是在底部有名称",表示它是一个系列.

You can see that they aren't the same because the results are different when you print them out. Yes, they're almost the same, but not the same. The first one has a column header, indicating that it's a DataFrame; the second has no column header but has the "Name" at the bottom, indicating it's a Series.

这篇关于在 pandas 中将lambda函数应用于列失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆