基于其他列的非常简单的函数,如何选择/添加一个列到 pandas 数据框 [英] how to select/add a column to pandas dataframe based on a non trivial function of other columns

查看:101
本文介绍了基于其他列的非常简单的函数,如何选择/添加一个列到 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个后续问题:如何根据其他列的功能选择/添加一个列到熊猫数据框?



具有数据框架,我想选择符合某些条件的行。标准是其他列的值和一些附加值的函数。



这是一个玩具示例:

 >> df = pd.DataFrame({'A':[1,2,3,4,5,6,7,8,9],
'B':[randint(1,9)for x in xrange (9)],
'C':[4,10,3,5,4,5,3,7,1]})
个;>
ABC
0 1 6 4
1 2 8 10
2 3 8 3
3 4 4 5
4 5 2 4
5 6 1 5
6 7 1 3
7 8 2 7
8 9 8 1
$ / pre>

我想选择一些非平凡函数返回true的所有行,例如F(A,C,L),其中L是列表和f返回真当且仅当A和C的列表是不一样的子列表的一部分。
也就是说,如果L = [[1,2,3],[4,2,10],[8,7,5,6,9]]我想得到:

  ABC 
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1

谢谢!

解决方案

这是一个非常非常优秀的非常优秀的解决方案。作为另一个免责声明,由于您的问题不会声明您想要做什么,如果列中的数字不在子列表中,此代码除了 isin()。

  import pandas as pd 

df = pd.DataFrame({ 'A':[1,2,3,4,5,6,7,8,9],
'b':[6,8,8,4,2,1, 1,2,8],
'C':[4,10,3,5,4,5,3,7,1]})

L = [[1, 2,3],[4,2,10],[8,7,5,6,9]]


df ['passed1'] = df ['A'] 。(s)()()()()()()() pass1'] ^ df ['passed2'])

df ['passed4'] = df ['A']。isin(L [1])$ ​​b $ b df ['passed5'] = df ['C']。isin(L [1])$ ​​b $ b df ['4& 5'] =(df ['passed4'] ^ df ['passed5'])

df ['passed7'] = df ['A']。isin(L [2])
df ['passed8'] = df ['C']。isin(L [2])
DF [ '7和8'] =(DF [ 'passed7'] ^ DF ['passed8 ])

DF [通过] = DF [ 1和2’ ]&安培; df ['4& 5'] ^ df ['7& 8']

del df ['passed1'],df ['passed2'],df ['1& 2'],df ['pass4'],df ['pass5'],df ['4& 5'],df ['passed7'],df ['passed8'],df ['7& 8']
df = df [df ['PASSED'] == True]
del df ['PASSED']

输出如下:

  ABC 
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1



我实现这个很快,所以这个代码完全和完整的丑陋,但我相信你可以重构任何你想要的方式(例如迭代在原来的一组列表与为sub_list在L ,改进变量名称,提出更好的解决方案等)。



希望这有帮助。哦,我提到这个是黑客而不是很好的代码?因为是。


This is a followup question for this one: how to select/add a column to pandas dataframe based on a function of other columns?

have a data frame and I want to select the rows that match some criteria. The criteria is a function of values of other columns and some additional values.

Here is a toy example:

>> df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
               'B': [randint(1,9) for x in xrange(9)],
               'C': [4,10,3,5,4,5,3,7,1]})
>>
   A  B   C
0  1  6   4
1  2  8  10
2  3  8   3
3  4  4   5
4  5  2   4
5  6  1   5
6  7  1   3
7  8  2   7
8  9  8   1

I want select all rows for which some non trivial function returns true, e.g. f(a,c,L), where L is a list of lists and f returns True iff a and c are not part of the same sublist. That is, if L = [[1,2,3],[4,2,10],[8,7,5,6,9]] I want to get:

   A  B   C
0  1  6   4
3  4  4   5
4  5  2   4
6  7  1   3
8  9  8   1

Thanks!

解决方案

Here is a VERY VERY hacky and non-elegant solution. As another disclaimer, since your question doesn't state what you want to do if a number in the column is in none of the sub lists this code doesn't handle that in any real way besides any default functionality within isin().

import pandas as pd

df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
               'B': [6,8,8,4,2,1,1,2,8],
               'C': [4,10,3,5,4,5,3,7,1]})

L = [[1,2,3],[4,2,10],[8,7,5,6,9]]


df['passed1'] = df['A'].isin(L[0])
df['passed2'] = df['C'].isin(L[0])
df['1&2'] = (df['passed1'] ^ df['passed2'])

df['passed4'] = df['A'].isin(L[1])
df['passed5'] = df['C'].isin(L[1])
df['4&5'] = (df['passed4'] ^ df['passed5'])

df['passed7'] = df['A'].isin(L[2])
df['passed8'] = df['C'].isin(L[2])
df['7&8'] = (df['passed7'] ^ df['passed8'])

df['PASSED'] = df['1&2'] & df['4&5'] ^ df['7&8'] 

del df['passed1'],  df['passed2'], df['1&2'], df['passed4'], df['passed5'], df['4&5'], df['passed7'], df['passed8'], df['7&8']
df = df[df['PASSED'] == True]
del df['PASSED']

With an output that looks like:

    A   B   C
0   1   6   4
3   4   4   5
4   5   2   4
6   7   1   3
8   9   8   1

I implemented this rather quickly hence the utter and complete ugliness of this code, but I believe you can refactor it any way you would like (e.g. iterate over the original set of lists with for sub_list in L, improve variable names, come up with a better solution, etc).

Hope this helps. Oh, and did I mention this was hacky and not very good code? Because it is.

这篇关于基于其他列的非常简单的函数,如何选择/添加一个列到 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆