为什么在使用 pandas apply 时会出现 AttributeError? [英] Why do I get an AttributeError when using pandas apply?
问题描述
我应该如何根据条件将 NaN 值转换为分类值.尝试转换 Nan 值时出错.
How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.
category gender sub-category title
health&beauty NaN makeup lipbalm
health&beauty women makeup lipstick
NaN NaN NaN lipgloss
我的 DataFrame 看起来像这样.我将性别中的 NaN 值转换为分类值的函数看起来像
My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like
def impute_gender(cols):
category=cols[0]
sub_category=cols[2]
gender=cols[1]
title=cols[3]
if title.str.contains('Lip') and gender.isnull==True:
return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)
如果我运行代码,我会收到错误
If I run the code I am getting error
----> 7 if title.str.contains('Lip') and gender.isnull()==True:
8 print(gender)
9
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')
完整数据集 -https://github.com/lakshmipriya04/py-sample
推荐答案
这里需要注意的一些事情 -
Some things to note here -
- 如果你只使用两列,调用
apply
超过 4 列是很浪费的 - 调用
apply
既浪费又低效,因为它很慢,使用大量内存,并且没有为您提供矢量化的好处 - 在应用中,您正在处理标量,因此您不会像使用
pd.Series
对象那样使用.str
访问器.title.contains
就足够了.或者更像 Python,"lip"在标题中
. gender.isnull
对解释器来说听起来完全错误,因为gender
是一个标量,它没有isnull
属性
- If you're using only two columns, calling
apply
over 4 columns is wasteful - Calling
apply
is wasteful and inefficient, because it is slow, uses a lot of memory, and offers no vectorisation benefits to you - In apply, you're dealing with scalars, so you do not use the
.str
accessor as you would apd.Series
object.title.contains
would be enough. Or more pythonically,"lip" in title
. gender.isnull
sounds completely wrong to the interpreter becausegender
is a scalar, it has noisnull
attribute
选项 1np.where
m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
这不仅速度快,而且更简单.如果您担心区分大小写,可以让您的 contains
检查不区分大小写 -
Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains
check case insensitive -
m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)
选项 2
另一种选择是使用 pd.Series.mask
/pd.Series.where
-
df['gender'] = df.gender.mask(m, 'women')
或者,
df['gender'] = df.gender.where(~m, 'women')
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
mask
根据提供的掩码将新值隐式应用于列.
The mask
implicitly applies the new value to the column based on the mask provided.
这篇关于为什么在使用 pandas apply 时会出现 AttributeError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!