如何根据pandas-python中其他列的值计算新列 [英] how to compute a new column based on the values of other columns in pandas - python
问题描述
>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
'b':['1' '2','2','1','2','2']})
>>> df
ab
0 l1 1
1 l2 2
2 l1 2
3 l2 1
4 l1 2
5 l2 2
l1
应对应于 1
而 l2
应该对应于 2
。
我想创建一个新列' c
',这样每一行 c = 1
如果 a = l1
和 b = 1
(或 a = l2
和 b = 2
)。如果 a = l1
和 b = 2
(或 a = l2
和 b = 1
)然后 c = 0
。
结果数据框应如下所示:
abc
0 l1 1 1
1 l2 2 1
2 l1 2 0
3 l2 1 0
4 l1 2 0
5 l2 2 1
我的数据框非常大,所以我真的在寻找使用大熊猫最有效的方法。
df = pd.DataFrame({'a':numpy.random.choice(['l1','l2'],1000000) ,
'b':numpy.random.choice(['1','2'],1000000)})
一个快速的解决方案只有两个不同的值:
%timeit df ['c'] = ((df.a =='l1')==(df.b =='1'))。astype(int)
10个循环,最佳3:178 ms每循环
@Viktor Kerkes:
%timeit df ['c'] =(df.a.str [-1] == df.b).astype(int)
1循环,最好的3 :每循环412 ms
@ user1470788:
%timeit df ['c'] =(((df ['a'] =='l1')&(df ['b'] =='1'))|((df ['a'] ==' ')&(df ['b'] =='2')))。astype(int)
1个循环,最佳3:363 ms每循环
@herrfz
code>%timeit df ['c'] =(df.a.apply(lambda x:x [1:])== df.b).astype(int)
1循环,最佳3:387 ms每循环
Let's say my data frame contains these data:
>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
'b':['1','2','2','1','2','2']})
>>> df
a b
0 l1 1
1 l2 2
2 l1 2
3 l2 1
4 l1 2
5 l2 2
l1
should correspond to 1
whereas l2
should correspond to 2
.
I'd like to create a new column 'c
' such that, for each row, c = 1
if a = l1
and b = 1
(or a = l2
and b = 2
). If a = l1
and b = 2
(or a = l2
and b = 1
) then c = 0
.
The resulting data frame should look like this:
a b c
0 l1 1 1
1 l2 2 1
2 l1 2 0
3 l2 1 0
4 l1 2 0
5 l2 2 1
My data frame is very large so I'm really looking for the most efficient way to do this using pandas.
df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
'b': numpy.random.choice(['1', '2'], 1000000)})
A fast solution assuming only two distinct values:
%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)
10 loops, best of 3: 178 ms per loop
@Viktor Kerkes:
%timeit df['c'] = (df.a.str[-1] == df.b).astype(int)
1 loops, best of 3: 412 ms per loop
@user1470788:
%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)
1 loops, best of 3: 363 ms per loop
@herrfz
%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)
1 loops, best of 3: 387 ms per loop
这篇关于如何根据pandas-python中其他列的值计算新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!