多条件 pandas 操作的嵌套numpy.where替代方法? [英] Alternatives to nested numpy.where for multiconditional pandas operations?
问题描述
我有一个带有条件列A和数字列B的Pandas DataFrame.
I have a Pandas DataFrame with conditional column A and numeric column B.
A B
1 'foo' 1.2
2 'bar' 1.3
3 'foo' 2.2
我还有一个Python字典,该字典定义了B的范围,给定每个A值,B的范围表示成功".
I also have a Python dictionary that defines ranges of B which denote "success" given each value of A.
mydict = {'foo': [1, 2], 'bar': [2, 3]}
我想在数据框中添加新列错误".它应说明B的值超出A可接受范围的范围.如果A在该范围内,则该值应为零.
I want to make a new column, 'error', in the dataframe. It should describe how far outside of the acceptable bounds for A the value of B falls. If A is within the range, the value should be zero.
A B error
1 'foo' 1.2 0
2 'bar' 1.3 -0.7
3 'foo' 2.2 0.2
我不是一个完整的Pandas/Numpy新手,我在Python方面还算过得去,但这证明有些困难.我不想用iterrows()来做,因为我知道这在计算上很昂贵,而且会被很多人引用.
I'm not a complete Pandas/Numpy newbie, and I'm halfway decent at Python, but this proved somewhat difficult. I don't want to do it with iterrows(), since I understand that's computationally expensive and this is going to get called a lot.
我最终找到了解决方案,将lambda函数,pandas.DataFrame.map()和嵌套的numpy.where()与可选x和y输入的给定值组合在一起.
I eventually figured out a solution by combining lambda functions, pandas.DataFrame.map(), and nested numpy.where()s with given values for the optional x and y inputs.
getmin = lambda x: mydict[x][0]
getmax = lambda x: mydict[x][1]
df['error'] = np.where(df.B < dtfr.A.map(getmin),
df.B - df.A.map(getmin),
np.where(df.B > df.A.map(getmax),
df.B - df.A.map(getmax),
0
)
)
它可以工作,但这可能不是实现此目的的最佳方法,对吗?我觉得我在滥用numpy.where()来解决不知道如何以非迭代方式将值从数据帧的多个列映射到lambda函数的问题. (也要避免编写轻度过时的lambda函数.)
It works, but this can't possibly be the best way to do this, right? I feel like I'm abusing numpy.where() to get around not knowing how to map values from multiple columns of a dataframe to a lambda function in a non-iterative way. (Also to avoid writing mildly gnarly lambda functions).
我猜有三个问题.
- 可以嵌套numpy.where()进行三条件数组操作吗?
- 如何非迭代地将两个数据框列映射到一个 功能?
- 如果2)是可能的,而1)是可以接受的,那是更可取的吗?
- Is it OK to nest numpy.where()s for triconditional array operations?
- How can I non-iteratively map from two dataframe columns to one function?
- If 2) is possible and 1) is acceptable, which is preferable?
推荐答案
对于有关如何映射多列的问题,您可以使用
For your question about how to map multiple columns, you do it with
DataFrame.apply( , axis =1)
对于您的问题,我认为您不需要这样做,但是我认为如果分几步进行计算就更清楚了:
For your question I don't think you need this, but I think it's clearer if you do your calculation in a few steps:
df['low'] = df.A.map(lambda x: mydict[x][0])
df['high'] = df.A.map(lambda x: mydict[x][1])
df['error'] = np.maximum(df.B - df.high, 0) + np.minimum(df.B - df.low, 0)
df
A B low high error
1 foo 1.2 1 2 0.0
2 bar 1.3 2 3 -0.7
3 foo 2.2 1 2 0.2
这篇关于多条件 pandas 操作的嵌套numpy.where替代方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!