为什么在使用 pandas apply 时会出现 AttributeError? [英] Why do I get an AttributeError when using pandas apply?

查看:49
本文介绍了为什么在使用 pandas apply 时会出现 AttributeError?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我应该如何根据条件将 NaN 值转换为分类值.尝试转换 Nan 值时出错.

How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

我的 DataFrame 看起来像这样.我将性别中的 NaN 值转换为分类值的函数看起来像

My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

如果我运行代码,我会收到错误

If I run the code I am getting error

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

完整数据集 -https://github.com/lakshmipriya04/py-sample

推荐答案

这里需要注意的一些事情 -

Some things to note here -

  1. 如果你只使用两列,调用 apply 超过 4 列是很浪费的
  2. 调用 apply 既浪费又低效,因为它很慢,使用大量内存,并且没有为您提供矢量化的好处
  3. 在应用中,您正在处理标量,因此您不会像使用 pd.Series 对象那样使用 .str 访问器.title.contains 就足够了.或者更像 Python,"lip"在标题中.
  4. gender.isnull 对解释器来说听起来完全错误,因为 gender 是一个标量,它没有 isnull 属性
  1. If you're using only two columns, calling apply over 4 columns is wasteful
  2. Calling apply is wasteful and inefficient, because it is slow, uses a lot of memory, and offers no vectorisation benefits to you
  3. In apply, you're dealing with scalars, so you do not use the .str accessor as you would a pd.Series object. title.contains would be enough. Or more pythonically, "lip" in title.
  4. gender.isnull sounds completely wrong to the interpreter because gender is a scalar, it has no isnull attribute


选项 1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

这不仅速度快,而且更简单.如果您担心区分大小写,可以让您的 contains 检查不区分大小写 -

Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains check case insensitive -

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)


选项 2
另一种选择是使用 pd.Series.mask/pd.Series.where -

df['gender'] = df.gender.mask(m, 'women')

或者,

df['gender'] = df.gender.where(~m, 'women')

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

mask 根据提供的掩码将新值隐式应用于列.

The mask implicitly applies the new value to the column based on the mask provided.

这篇关于为什么在使用 pandas apply 时会出现 AttributeError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆