为什么使用pandas apply时会出现AttributeError? [英] Why do I get an AttributeError when using pandas apply?
问题描述
如何根据条件将NaN值转换为分类值.尝试转换Nan值时出现错误.
How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.
category gender sub-category title
health&beauty NaN makeup lipbalm
health&beauty women makeup lipstick
NaN NaN NaN lipgloss
我的DataFrame看起来像这样.我将性别的NaN值转换为分类值的功能看起来像
My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like
def impute_gender(cols):
category=cols[0]
sub_category=cols[2]
gender=cols[1]
title=cols[3]
if title.str.contains('Lip') and gender.isnull==True:
return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)
如果我运行代码,我会报错
If I run the code I am getting error
----> 7 if title.str.contains('Lip') and gender.isnull()==True:
8 print(gender)
9
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')
完整数据集- https://github. com/lakshmipriya04/py-sample
推荐答案
这里要注意一些事情-
- 如果仅使用两列,则在4列上调用
apply
是浪费的 - 呼叫
apply
通常很浪费,因为它速度慢且对您没有向量化优势 - 在apply中,您要处理标量,因此您不会像使用
pd.Series
对象那样使用.str
访问器.title.contains
就足够了.或更Python,"lip" in title
. -
gender.isnull
是完全错误的,gender
是标量,没有isnull
属性
- If you're using only two columns, calling
apply
over 4 columns is wasteful - Calling
apply
is wasteful in general, because it is slow and offers no vectorisation benefits to you - In apply, you're dealing with scalars, so you do not use the
.str
accessor as you would apd.Series
object.title.contains
would be enough. Or more pythonically,"lip" in title
. gender.isnull
is completely wrong,gender
is a scalar, it has noisnull
attribute
选项1
np.where
Option 1
np.where
m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
不仅速度快,而且更简单.如果您担心区分大小写,可以使contains
检查大小写不敏感-
Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains
check case insensitive -
m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)
选项2
另一种选择是使用pd.Series.mask
/pd.Series.where
-
Option 2
Another alternative is using pd.Series.mask
/pd.Series.where
-
df['gender'] = df.gender.mask(m, 'women')
或者
df['gender'] = df.gender.where(~m, 'women')
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
mask
根据提供的掩码将新值隐式应用于列.
The mask
implicitly applies the new value to the column based on the mask provided.
这篇关于为什么使用pandas apply时会出现AttributeError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!