根据条件从 pandas 数据框(或numpy ndarray?)中选择 [英] Selecting from pandas dataframe (or numpy ndarray?) by criterion
问题描述
我发现自己对这种模式进行了很多编码 :
I find myself coding this sort of pattern a lot:
tmp = <some operation>
result = tmp[<boolean expression>]
del tmp
...其中,<boolean expression>
应理解为布尔表达式 involving tmp
. (暂时,tmp
始终是一个熊猫数据框,但我想如果我使用numpy ndarrays,也会显示相同的模式-不确定.)
...where <boolean expression>
is to be understood as a boolean expression involving tmp
. (For the time being, tmp
is always a pandas dataframe, but I suppose that the same pattern would show up if I were working with numpy ndarrays--not sure.)
例如:
tmp = df.xs('A')['II'] - df.xs('B')['II']
result = tmp[tmp < 0]
del tmp
从最后的del tmp
可以猜到,创建tmp
的 only 的原因是,这样我就可以在应用于该索引的表达式中使用一个布尔表达式,将其包含在内它.
As one can guess from the del tmp
at the end, the only reason for creating tmp
at all is so that I can use a boolean expression involving it inside an indexing expression applied to it.
我很想消除对这种(否则无用的)中间体的需要,但是我不知道有什么有效的 1 方式可以做到这一点. (请纠正我,如果我错了!)
I would love to eliminate the need for this (otherwise useless) intermediate, but I don't know of any efficient1 way to do this. (Please, correct me if I'm wrong!)
第二好的,我想将此模式推到一些辅助函数中.问题是找到一种将<boolean expression>
传递给它的不错的方法.我只能想到in亵的人.例如:
As second best, I'd like to push off this pattern to some helper function. The problem is finding a decent way to pass the <boolean expression>
to it. I can only think of indecent ones. E.g.:
def filterobj(obj, criterion):
return obj[eval(criterion % 'obj')]
这实际上有效 2 :
filterobj(df.xs('A')['II'] - df.xs('B')['II'], '%s < 0')
# Int
# 0 -1.650107
# 2 -0.718555
# 3 -1.725498
# 4 -0.306617
# Name: II
...但是使用eval
总是让我感觉到所有yukky'n'东西...请让我知道是否还有其他方法.
...but using eval
always leaves me feeling all yukky 'n' stuff... Please let me know if there's some other way.
1 例如,我想到的涉及内置filter
的任何方法都可能是无效的,因为它会通过在Python中"迭代来应用标准(一些lambda函数),在熊猫(或numpy)对象上...
1E.g., any approach I can think of involving the filter
built-in is probably ineffiencient, since it would apply the criterion (some lambda function) by iterating, "in Python", over the panda (or numpy) object...
2 上面最后一个表达式中使用的df
定义如下:
2The definition of df
used in the last expression above would be something like this:
import itertools
import pandas as pd
import numpy as np
a = ('A', 'B')
i = range(5)
ix = pd.MultiIndex.from_tuples(list(itertools.product(a, i)),
names=('Alpha', 'Int'))
c = ('I', 'II', 'III')
df = pd.DataFrame(np.random.randn(len(idx), len(c)), index=ix, columns=c)
推荐答案
由于Python的工作方式,我认为这会很困难.我只能想到一些骇客,它们只会使您成为其中的一部分.像
Because of the way Python works, I think this one's going to be tough. I can only think of hacks which only get you part of the way there. Something like
def filterobj(obj, fn):
return obj[fn(obj)]
filterobj(df.xs('A')['II'] - df.xs('B')['II'], lambda x: x < 0)
应该工作,除非我错过了一些东西.这种方式使用lambda是延迟评估的常用技巧之一.
should work, unless I've missed something. Using lambdas this way is one of the usual tricks for delaying evaluation.
大声思考:一个人可以制作一个this
对象,该对象不会被评估,只是作为表达式而存在,就像
Thinking out loud: one could make a this
object which isn't evaluated but just sticks around as an expression, something like
>>> this
this
>>> this < 3
this < 3
>>> df[this < 3]
Traceback (most recent call last):
File "<ipython-input-34-d5f1e0baecf9>", line 1, in <module>
df[this < 3]
[...]
KeyError: u'no item named this < 3'
,然后将this
特殊处理成大熊猫,或者仍然具有类似的功能
and then either special-case the treatment of this
into pandas or still have a function like
def filterobj(obj, criterion):
return obj[eval(str(criterion.subs({"this": "obj"})))]
(如果有足够的工作,我们可能会丢失eval
,这仅仅是概念上的证明),然后类似
(with enough work we could lose the eval
, this is simply proof of concept) after which something like
>>> tmp = df["I"] + df["II"]
>>> tmp[tmp < 0]
Alpha Int
A 4 -0.464487
B 3 -1.352535
4 -1.678836
Dtype: float64
>>> filterobj(df["I"] + df["II"], this < 0)
Alpha Int
A 4 -0.464487
B 3 -1.352535
4 -1.678836
Dtype: float64
会工作.我不确定这其中的任何一个值得头痛,但是,Python根本不是非常有利于这种样式.
would work. I'm not sure any of this is worth the headache, though, Python simply isn't very conducive to this style.
这篇关于根据条件从 pandas 数据框(或numpy ndarray?)中选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!