如何根据其他列的功能选择/添加一个列到 pandas 数据框? [英] how to select/add a column to pandas dataframe based on a function of other columns?

查看:167
本文介绍了如何根据其他列的功能选择/添加一个列到 pandas 数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想选择符合某些条件的行。标准是其他列的值和一些附加值的函数。

I have a data frame and I want to select the rows that match some criteria. The criteria is a function of values of other columns and some additional values.

这是一个玩具示例:

>>df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
                   'B': [randint(1,9) for x in xrange(9)],
                   'C': [4,10,3,5,4,5,3,7,1]})
>>

      A  B   C
   0  1  6   4
   1  2  8  10
   2  3  8   3
   3  4  4   5
   4  5  2   4
   5  6  1   5
   6  7  1   3
   7  8  2   7
   8  9  8   1

b
$ b

,我想选择一些函数返回true的所有行,例如如果乘法AxC在指定的列表L中,则f(a,c,L)返回true,例如L = [4,20,30](尽管函数可能不太平凡)。也就是说,我想得到:

and I want select all rows for which some function returns true, e.g. f(a,c,L) returns true iff the multiplication AxC is in the specified lists L, say L=[4,20,30] (though the function could be a less trivial one). That is, I want to get:

>>
      A  B   C
   0  1  6   4
   1  2  8  10
   3  4  4   5
   4  5  2   4
   5  6  1   5

同样,我想添加一个第二个,二进制列匹配,这是True是AxC在L: / p>

Similarly, I'd like to add a forth, binary, column 'matched' which is True is AxC in L:

      A  B   C  matched
   0  1  2   4    True
   1  2  5  10    True
   2  3  6   3   False
   3  4  3   5    True
   4  5  2   4    True
   5  6  6   5    True
   6  7  4   3   False
   7  8  5   7   False
   8  9  2   1   False

(添加此列后,您可以轻松地选择所有行是的,但我怀疑,一旦你可以添加你也可以选择)。

(once this column is added you can easily select all the lines with the True, but I suspect that once you can add you could also select).

有没有一个高效优雅的方式来做,而不显式地迭代所有的索引?
谢谢!

Is there an efficient and elegant way to do it without explicitly iterating all indices? Thanks!

推荐答案

使用 isin

In [5]:

L=[4,20,30]
df['Match'] = (df['A']*df['C']).isin(L)
df
Out[5]:
   A  B   C  Match
0  1  6   4   True
1  2  1  10   True
2  3  8   3  False
3  4  4   5   True
4  5  2   4   True
5  6  4   5   True
6  7  4   3  False
7  8  7   7  False
8  9  4   1  False

时间:

In [9]:

%%timeit
L=[4,20,30]
rowindex = df.apply(lambda x : True if (x['A'] * x['C']) in L else False, axis=1)
df.loc[rowindex,'match'] = True
df.loc[~rowindex,'match'] = False
100 loops, best of 3: 3.13 ms per loop
In [11]:

%%timeit 
L=[4,20,30]
df['Match'] = (df['A']*df['C']).isin(L)

1000 loops, best of 3: 678 µs per loop

这篇关于如何根据其他列的功能选择/添加一个列到 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆