嵌套np.where [英] Nested np.where

查看:116
本文介绍了嵌套np.where的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

S A
1 1
1 0
2 1
2 0

我想创建一个新的'Result'列,该列基于A列和S列的值进行计算.

I wanted to create a new 'Result' column that is calculated based on the values of both column A and column S.

我编写了以下嵌套的np.where代码

I wrote the following nested np.where code

df['Result'] = np.where((df.S == 1 & df.A == 1), 1,
                        (df.S == 1 & df.A == 0), 0,
                        (df.S == 2 & df.A == 1), 0,
                        (df.S == 2 & df.A == 0), 1))))

但是当我执行它时,出现以下错误:

but when I execute it, I get the following error:

SyntaxError: invalid syntax

我的代码有什么问题?

推荐答案

据我所知,np.where不支持多个return语句(至少不超过两个).因此,要么重写您的np.where来生成一个True和False语句,并为True/False返回1/0,否则您需要使用掩码.

As far as I know np.where does not support multiple return statements (at least not more than two). So either you rewrite your np.where to result in one True and one False statement and to return 1/0 for True/False, or you need to use masks.

如果重写np.where,则限于两个结果,并且当条件不为True时,将始终设置第二个结果.因此,还将为(S == 5) & (A = np.nan)之类的值进行设置.

If you rewrite np.where, you are limited to two results and the second result will always be set when the condition is not True. So it will be also set for values like (S == 5) & (A = np.nan).

df['Result'] = np.where(((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0)), 1, 0)

使用遮罩时,可以应用任意数量的条件和结果.对于您的示例,解决方案如下:

When using masks, you can apply an arbitrary number of conditions and results. For your example, the solution looks like:

mask_0 = ((df.S == 1) & (df.A == 0)) | ((df.S == 2) & (df.A == 1))
mask_1 = ((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0))
df.loc[mask_0, 'Result'] = 0
df.loc[mask_1, 'Result'] = 1

在不满足任何条件的情况下,结果将设置为np.nan.这是不安全的,因此应使用.但是,如果要在这些位置使用零,则只需用零初始化Results列即可.
当然,对于特殊情况(例如仅包含1和0的情况)可以简化此操作,并且可以通过使用dict或其他容器将其扩展为任意数量的结果.

Results will be set to np.nan where no condition is met. This is imho failsafe and should thus be used. But if you want to have zeros in these locations, just initialize your Results column with zeros.
Of course this can be simplified for special cases like only having 1 and 0 as result and extended for any number of result by using dicts or other containers.

这篇关于嵌套np.where的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆