根据数据框中的信息在Pandas数据框中创建变量 [英] Create a variable in a Pandas dataframe based on information in the dataframe
问题描述
我有一个按照以下方式组织的数据框
I have a dataframe organized in the following way
var1 var2 var3 var4
0 A 23 B 7
1 B 13 C 4
2 C 12 A 11
3 A 5 C 15
我现在想创建一个新的变量(列)var5,如果var1 == A,则取var2的值,如果var3 == A,则取var4的值.为简单起见,var1和var3不能都具有值A.如果var1或var3都不采用值A,那么我想要NaN.也就是说,此示例中的结果将是:
I now want to create a new variable (column), var5, which takes the value of var2 if var1 == A and the value of var4 if var3 == A. For simplicity, var1 and var3 can never both have the value A. If neither var1 or var3 takes value A, then I want NaN. That is, the outcome in this example would be:
var1 var2 var3 var4 var5
0 A 23 B 7 23
1 B 13 C 4 NaN
2 C 12 A 11 11
3 A 5 C 15 5
如何实现?
推荐答案
选项1
听起来您可以为此使用np.where
-
Option 1
Sounds like you can use np.where
for this -
i = df.var1 == 'A'
j = df.var3 == 'A'
df['var5'] = np.where(i, df.var2, np.where(j, df.var4, np.NaN))
df
var1 var2 var3 var4 var5
0 A 23 B 7 23.0
1 B 13 C 4 NaN
2 C 12 A 11 11.0
3 A 5 C 15 5.0
选项2
另一种选择是np.select
-
Option 2
An alternative would be np.select
-
df['var5'] = np.select([i, j], [df.var2, df.var4], default=np.nan)
df
var1 var2 var3 var4 var5
0 A 23 B 7 23.0
1 B 13 C 4 NaN
2 C 12 A 11 11.0
3 A 5 C 15 5.0
注意,i
和j
与在选项1 的代码清单中定义的变量相同.
Note, i
and j
are the same variables defined in the code listing for Option 1.
选项3
pd.Series.mask
/where
df.var2.mask(~i, df.var4.mask(~j, np.nan))
0 23.0
1 NaN
2 11.0
3 5.0
Name: var2, dtype: float64
这篇关于根据数据框中的信息在Pandas数据框中创建变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!