np.where在多个变量上 [英] np.where on multiple variables

查看:217
本文介绍了np.where在多个变量上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含:

I have a data frame with:

customer_id [1,2,3,4,5,6,7,8,9,10]
feature1 [0,0,1,1,0,0,1,1,0,0]
feature2 [1,0,1,0,1,0,1,0,1,0]
feature3 [0,0,1,0,0,0,1,0,0,0]

使用此方法,我想创建一个新变量(例如new_var),以说当特征1为1时,则new_var = 1,如果feature_2 = 1,则new_var = 2,feature3 = 1,然后new_var = 3,否则为4.尝试np.where,但是虽然它没有给我一个错误,但是它做的并不正确-所以我猜想嵌套的np.where仅对单个变量有效.在哪种情况下,在熊猫中执行嵌套if/case的最佳方法是什么?

Using this I want to create a new variable (say new_var) to say when feature 1 is 1 then the new_var=1, if feature_2=1 then new_var=2, feature3=1 then new_var=3 else 4. I was trying np.where but though it doesn't give me an error, it doesn't do the right thing - so I guess a nested np.where works on a single variable only. In which case, what's the best way to perform a nested if/case when in pandas?

我的np.where代码是这样的:

My np.where code was something like this:

df[new_var]=np.where(df['feature1']==1,'1', np.where(df['feature2']==1,'2', np.where(df[feature3']==1,'3','4')))

推荐答案

我认为您需要 numpy.select -它选择第一个True值,而其他所有值都不重要:

I think you need numpy.select - it select first True values and all another are not important:

m1 = df['feature1']==1 
m2 = df['feature2']==1    
m3 = df['feature3']==1 
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')

示例:

customer_id = [1,2,3,4,5,6,7,8,9,10]
feature1 = [0,0,1,1,0,0,1,1,0,0]
feature2 = [1,0,1,0,1,0,1,0,1,0]
feature3  = [0,0,1,0,0,0,1,0,0,0]

df = pd.DataFrame({'customer_id':customer_id,
                   'feature1':feature1,
                   'feature2':feature2,
                   'feature3':feature3})

m1 = df['feature1']==1 
m2 = df['feature2']==1    
m3 = df['feature3']==1 
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
   customer_id  feature1  feature2  feature3 new_var
0            1         0         1         0       2
1            2         0         0         0       4
2            3         1         1         1       1
3            4         1         0         0       1
4            5         0         1         0       2
5            6         0         0         0       4
6            7         1         1         1       1
7            8         1         0         0       1
8            9         0         1         0       2
9           10         0         0         0       4

如果仅在features中将10转换为False,将1转换为True:

If in features only 1 and 0 is possible convert 0 to False and 1 to True:

m1 = df['feature1'].astype(bool)
m2 = df['feature2'].astype(bool)
m3 = df['feature3'].astype(bool)
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
   customer_id  feature1  feature2  feature3 new_var
0            1         0         1         0       2
1            2         0         0         0       4
2            3         1         1         1       1
3            4         1         0         0       1
4            5         0         1         0       2
5            6         0         0         0       4
6            7         1         1         1       1
7            8         1         0         0       1
8            9         0         1         0       2
9           10         0         0         0       4

这篇关于np.where在多个变量上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆