为什么使用pandas apply时会出现AttributeError? [英] Why do I get an AttributeError when using pandas apply?

查看:77
本文介绍了为什么使用pandas apply时会出现AttributeError?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据条件将NaN值转换为分类值.尝试转换Nan值时出现错误.

How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

我的DataFrame看起来像这样.我将性别的NaN值转换为分类值的功能看起来像

My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

如果我运行代码,我会报错

If I run the code I am getting error

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

完整数据集- https://github. com/lakshmipriya04/py-sample

推荐答案

这里要注意一些事情-

  1. 如果仅使用两列,则在4列上调用apply是浪费的
  2. 呼叫apply通常很浪费,因为它速度慢且对您没有向量化优势
  3. 在apply中,您要处理标量,因此您不会像使用pd.Series对象那样使用.str访问器. title.contains就足够了.或更Python,"lip" in title.
  4. gender.isnull是完全错误的,gender是标量,没有isnull属性
  1. If you're using only two columns, calling apply over 4 columns is wasteful
  2. Calling apply is wasteful in general, because it is slow and offers no vectorisation benefits to you
  3. In apply, you're dealing with scalars, so you do not use the .str accessor as you would a pd.Series object. title.contains would be enough. Or more pythonically, "lip" in title.
  4. gender.isnull is completely wrong, gender is a scalar, it has no isnull attribute


选项1
np.where


Option 1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

不仅速度快,而且更简单.如果您担心区分大小写,可以使contains检查大小写不敏感-

Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains check case insensitive -

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)


选项2
另一种选择是使用pd.Series.mask/pd.Series.where-


Option 2
Another alternative is using pd.Series.mask/pd.Series.where -

df['gender'] = df.gender.mask(m, 'women')

或者

df['gender'] = df.gender.where(~m, 'women')

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

mask根据提供的掩码将新值隐式应用于列.

The mask implicitly applies the new value to the column based on the mask provided.

这篇关于为什么使用pandas apply时会出现AttributeError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆