python:带有多个条件的pandas np.where与df.loc [英] python: pandas np.where vs. df.loc with multiple conditions

查看:1385
本文介绍了python:带有多个条件的pandas np.where与df.loc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Np.where一直给我带来很多错误,因此我正在寻找使用df.loc的解决方案.

Np.where has been giving me a lot of errors, so I am looking for a solution with df.loc instead.

这是我一直在得到的np.where错误:

This is the np.where error I have been getting:

C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

我正在使用以下数据框df:

I am working with the following dataframe df:

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],'checked': ['0','0','1','0'],'duplicate': ['True','True','False','False']})

    Column_A    checked   duplicate
0   AAA             0      True
1   AAA             0      True
2   ABC             1      False
3   CDE             0      False

如果要检查是否为0且重复项为True,我想创建一个附加标志.

I want to create an additional flag, if checked is 0 and duplicate is True.

我尝试了一下,但没有成功:

I tried this and it didn't work:

df['flag'] = (np.where((df['checked'] == 'Y') &(df['duplicate'] == 'True'), 'Y', '0'))

TypeError: invalid type comparison

我用df.loc尝试过:

I tried it with df.loc:

df['flag'] = (df.loc[df['checked'] == 'Y']& df.loc[df['duplicate'] == 'True'], 'Y','0')

TypeError: invalid type comparison

我得到同样的错误!

推荐答案

我认为您的boolean不是string,因此需要删除':

I think your boolean are not strings, so need remove ':

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
                  'checked': ['0','0','1','0'],
                  'duplicate': [True, True, False, False]})

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

或者如果与boolean列进行比较,则可以省略== True:

Or if compare with boolean column, == True can be omited:

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

也需要检查checked需要',因为strings:

Also if need check checked need ' because strings:

df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

使用 loc 的解决方案:

Solution with loc:

df['flag'] = '0'
mask = (df['checked'] == '0') &(df['duplicate'])
df.loc[mask, 'flag'] = 'Y'
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

这篇关于python:带有多个条件的pandas np.where与df.loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆