如何根据pandas-python中其他列的值计算新列 [英] how to compute a new column based on the values of other columns in pandas - python

查看：2003 发布时间：2017/3/26 0:31:41 python pandas dataframe

本文介绍了如何根据pandas-python中其他列的值计算新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们的数据框包含这些数据：

 >>> df = pd.DataFrame（{'a'：['l1'，'l2'，'l1'，'l2'，'l1'，'l2']，
'b'：['1' '2'，'2'，'1'，'2'，'2']}）
>>> df 
ab 
 0 l1 1 
 1 l2 2 
 2 l1 2 
 3 l2 1 
 4 l1 2 
 5 l2 2

l1 应对应于 1 而 l2 应该对应于 2 。
我想创建一个新列' c '，这样每一行 c = 1 如果 a = l1 和 b = 1 （或 a = l2 和 b = 2 ）。如果 a = l1 和 b = 2 （或 a = l2 和 b = 1 ）然后 c = 0 。

结果数据框应如下所示：

我的数据框非常大，所以我真的在寻找使用大熊猫最有效的方法。

解决方案

  df = pd.DataFrame（{'a'：numpy.random.choice（['l1'，'l2']，1000000） ，
'b'：numpy.random.choice（['1'，'2']，1000000）}）

一个快速的解决方案只有两个不同的值：

 ％timeit df ['c'] = （（df.a =='l1'）==（df.b =='1'））。astype（int）

10个循环，最佳3：178 ms每循环

@Viktor Kerkes：

 ％timeit df ['c'] =（df.a.str [-1] == df.b）.astype（int）

1循环，最好的3 ：每循环412 ms

@ user1470788：

 ％timeit df ['c'] =（（（df ['a'] =='l1'）&（df ['b'] =='1'））|（（df ['a'] ==' '）&（df ['b'] =='2'）））。astype（int）

1个循环，最佳3：363 ms每循环

@herrfz

 code>％timeit df ['c'] =（df.a.apply（lambda x：x [1：]）== df.b）.astype（int）

1循环，最佳3：387 ms每循环

Let's say my data frame contains these data:

>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
                       'b':['1','2','2','1','2','2']})
>>> df
    a       b
0  l1       1
1  l2       2
2  l1       2
3  l2       1
4  l1       2
5  l2       2

l1 should correspond to 1 whereas l2 should correspond to 2. I'd like to create a new column 'c' such that, for each row, c = 1 if a = l1 and b = 1 (or a = l2 and b = 2). If a = l1 and b = 2 (or a = l2 and b = 1) then c = 0.

The resulting data frame should look like this:

  a         b   c
0  l1       1   1
1  l2       2   1
2  l1       2   0
3  l2       1   0
4  l1       2   0
5  l2       2   1

My data frame is very large so I'm really looking for the most efficient way to do this using pandas.

解决方案

df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                   'b': numpy.random.choice(['1', '2'], 1000000)})

A fast solution assuming only two distinct values:

%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)

10 loops, best of 3: 178 ms per loop

@Viktor Kerkes:

%timeit df['c'] = (df.a.str[-1] == df.b).astype(int)

1 loops, best of 3: 412 ms per loop

@user1470788:

%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

1 loops, best of 3: 363 ms per loop

@herrfz

%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

1 loops, best of 3: 387 ms per loop

这篇关于如何根据pandas-python中其他列的值计算新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何根据pandas-python中其他列的值计算新列 [英] how to compute a new column based on the values of other columns in pandas - python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何根据pandas-python中其他列的值计算新列 [英] how to compute a new column based on the values of other columns in pandas - python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭