多条件 pandas 操作的嵌套numpy.where替代方法? [英] Alternatives to nested numpy.where for multiconditional pandas operations?

查看:64
本文介绍了多条件 pandas 操作的嵌套numpy.where替代方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有条件列A和数字列B的Pandas DataFrame.

I have a Pandas DataFrame with conditional column A and numeric column B.

    A    B
1 'foo' 1.2
2 'bar' 1.3
3 'foo' 2.2

我还有一个Python字典,该字典定义了B的范围,给定每个A值,B的范围表示成功".

I also have a Python dictionary that defines ranges of B which denote "success" given each value of A.

mydict = {'foo': [1, 2], 'bar': [2, 3]}

我想在数据框中添加新列错误".它应说明B的值超出A可接受范围的范围.如果A在该范围内,则该值应为零.

I want to make a new column, 'error', in the dataframe. It should describe how far outside of the acceptable bounds for A the value of B falls. If A is within the range, the value should be zero.

    A    B   error
1 'foo' 1.2   0
2 'bar' 1.3  -0.7
3 'foo' 2.2   0.2

我不是一个完整的Pandas/Numpy新手,我在Python方面还算过得去,但这证明有些困难.我不想用iterrows()来做,因为我知道这在计算上很昂贵,而且会被很多人引用.

I'm not a complete Pandas/Numpy newbie, and I'm halfway decent at Python, but this proved somewhat difficult. I don't want to do it with iterrows(), since I understand that's computationally expensive and this is going to get called a lot.

我最终找到了解决方案,将lambda函数,pandas.DataFrame.map()和嵌套的numpy.where()与可选x和y输入的给定值组合在一起.

I eventually figured out a solution by combining lambda functions, pandas.DataFrame.map(), and nested numpy.where()s with given values for the optional x and y inputs.

getmin = lambda x: mydict[x][0]
getmax = lambda x: mydict[x][1] 
df['error'] = np.where(df.B < dtfr.A.map(getmin),
                       df.B - df.A.map(getmin),
                       np.where(df.B > df.A.map(getmax),
                                df.B - df.A.map(getmax),
                                0
                                )
                       )

它可以工作,但这可能不是实现此目的的最佳方法,对吗?我觉得我在滥用numpy.where()来解决不知道如何以非迭代方式将值从数据帧的多个列映射到lambda函数的问题. (也要避免编写轻度过时的lambda函数.)

It works, but this can't possibly be the best way to do this, right? I feel like I'm abusing numpy.where() to get around not knowing how to map values from multiple columns of a dataframe to a lambda function in a non-iterative way. (Also to avoid writing mildly gnarly lambda functions).

我猜有三个问题.

  1. 可以嵌套numpy.where()进行三条件数组操作吗?
  2. 如何非迭代地将两个数据框列映射到一个 功能?
  3. 如果2)是可能的,而1)是可以接受的,那是更可取的吗?
  1. Is it OK to nest numpy.where()s for triconditional array operations?
  2. How can I non-iteratively map from two dataframe columns to one function?
  3. If 2) is possible and 1) is acceptable, which is preferable?

推荐答案

对于有关如何映射多列的问题,您可以使用

For your question about how to map multiple columns, you do it with

DataFrame.apply( , axis =1)

对于您的问题,我认为您不需要这样做,但是我认为如果分几步进行计算就更清楚了:

For your question I don't think you need this, but I think it's clearer if you do your calculation in a few steps:

df['low'] = df.A.map(lambda x: mydict[x][0])
df['high'] = df.A.map(lambda x: mydict[x][1])
df['error'] = np.maximum(df.B - df.high, 0) + np.minimum(df.B - df.low, 0)
df
     A    B  low  high  error
1  foo  1.2    1     2    0.0
2  bar  1.3    2     3   -0.7
3  foo  2.2    1     2    0.2

这篇关于多条件 pandas 操作的嵌套numpy.where替代方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆