比较 pandas 数据框中的两列以创建第三列 [英] Comparing two columns in pandas dataframe to create a third one
本文介绍了比较 pandas 数据框中的两列以创建第三列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框:
In [25]: df1
Out[25]:
a b
0 0.752072 0.813426
1 0.868841 0.354665
2 0.944651 0.745505
3 0.485834 0.163747
4 0.001487 0.820176
5 0.904039 0.136355
6 0.572265 0.250570
7 0.514955 0.868373
8 0.195440 0.484160
9 0.506443 0.523912
现在,我想创建另一列df1['c']
,其值在df1['a']
和df1['b']
中最大.因此,我希望将其作为输出:
Now I want to create another column df1['c']
whose values would be maximum among df1['a']
and df1['b']
. Thus, I would like to have this as an output:
In [25]: df1
Out[25]:
a b c
0 0.752072 0.813426 0.813426
1 0.868841 0.354665 0.868841
2 0.944651 0.745505 0.944651
3 0.485834 0.163747 0.485834
4 0.001487 0.820176 0.820176
我尝试过:
In [23]: df1['c'] = np.where(max(df1['a'], df1['b'], df1['a'], df1['b'])
但是,这会引发语法错误.我看不出有什么方法可以在大熊猫中做到这一点.我的实际数据框太复杂了,因此我想为此提供一个通用的解决方案.有什么想法吗?
However, this throws a syntax error. I don't see any way in which I can do this in pandas. My actual dataframe is way too complex and so I would like to have a generic solution for this. Any ideas?
推荐答案
You can use Series.where
:
df['c'] = df.b.where(df.a < df.b, df.a)
print (df)
a b c
0 0.752072 0.813426 0.813426
1 0.868841 0.354665 0.868841
2 0.944651 0.745505 0.944651
3 0.485834 0.163747 0.485834
4 0.001487 0.820176 0.820176
5 0.904039 0.136355 0.904039
6 0.572265 0.250570 0.572265
7 0.514955 0.868373 0.868373
8 0.195440 0.484160 0.484160
9 0.506443 0.523912 0.523912
使用 numpy.where
:
df['c'] = np.where(df['a'] > df['b'], df['a'], df['b'])
print (df)
a b c
0 0.752072 0.813426 0.813426
1 0.868841 0.354665 0.868841
2 0.944651 0.745505 0.944651
3 0.485834 0.163747 0.485834
4 0.001487 0.820176 0.820176
5 0.904039 0.136355 0.904039
6 0.572265 0.250570 0.572265
7 0.514955 0.868373 0.868373
8 0.195440 0.484160 0.484160
9 0.506443 0.523912 0.523912
或者找到更简单的 max
:
Or simplier is find max
:
df['c'] = df[['a','b']].max(axis=1)
print (df)
a b c
0 0.752072 0.813426 0.813426
1 0.868841 0.354665 0.868841
2 0.944651 0.745505 0.944651
3 0.485834 0.163747 0.485834
4 0.001487 0.820176 0.820176
5 0.904039 0.136355 0.904039
6 0.572265 0.250570 0.572265
7 0.514955 0.868373 0.868373
8 0.195440 0.484160 0.484160
9 0.506443 0.523912 0.523912
这篇关于比较 pandas 数据框中的两列以创建第三列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文