修改函数以返回具有指定值的数据框 [英] Modify function to return dataframe with specified values

查看:81
本文介绍了修改函数以返回具有指定值的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考下面的测试数据和我用来识别彼此变量thresh内的值的函数.

With reference to the test data below and the function I use to identify values within variable thresh of each other.

任何人都可以帮我修改它以显示所需的输出吗?

Can anyone please help me modify this to show the desired output I have shown?

测试数据

import pandas as pd
import numpy as np
from itertools import combinations
df2 = pd.DataFrame(
       {'AAA' : [4,5,6,7,9,10], 
        'BBB' : [10,20,30,40,11,10],
        'CCC' : [100,50,25,10,10,11],
        'DDD' : [98,50,25,10,10,11],
        'EEE' : [103,50,25,10,10,11]});

功能:

thresh = 5    
def closeCols2(df):
        max_value = None
        for k1,k2 in combinations(df.keys(),2):
            if abs(df[k1] - df[k2]) < thresh:
                if max_value is None:
                    max_value = max(df[k1],df[k2])
                else:
                    max_value = max(max_value, max(df[k1],df[k2]))
        return max_value 

应用数据之前功能:

    AAA BBB CCC DDD EEE
0   4   10  100 98  103
1   5   20  50  50  50
2   6   30  25  25  25
3   7   40  10  10  10
4   9   11  10  10  10
5   10  10  11  11  11

应用后的当前系列输出:

df2.apply(closeCols2, axis=1)

0    103
1     50
2     25
3     10
4     11
5     11
dtype: int64

所需的输出是一个数据框,其中显示了thresh内的所有值,而对于非阈值内的任何值均显示为nan

Desired output is a dataframe showing all values within thresh and a nan for any not within thresh

    AAA BBB CCC DDD EEE
0   nan nan 100 98  103
1   nan nan 50  50  50
2   nan 30  25  25  25
3   7   nan 10  10  10
4   9   11  10  10  10
5   10  10  11  11  11

推荐答案

masksubaxis=1

df2.mask(df2.sub(df2.apply(closeCols2, 1), 0).abs() > thresh)

    AAA   BBB  CCC  DDD  EEE
0   NaN   NaN  100   98  103
1   NaN   NaN   50   50   50
2   NaN  30.0   25   25   25
3   7.0   NaN   10   10   10
4   9.0  11.0   10   10   10
5  10.0  10.0   11   11   11


注释:
我将重新定义closeCols以包括thresh作为参数.然后,您可以在apply调用中传递它.


note:
I'd redefine closeCols to include thresh as a parameter. Then you could pass it in the apply call.

def closeCols2(df, thresh):
        max_value = None
        for k1,k2 in combinations(df.keys(),2):
            if abs(df[k1] - df[k2]) < thresh:
                if max_value is None:
                    max_value = max(df[k1],df[k2])
                else:
                    max_value = max(max_value, max(df[k1],df[k2]))
        return max_value 

df2.apply(closeCols2, 1, thresh=5)


额外信用
我矢量化了您的closeCols并将其嵌入,以获得一些令人麻木的乐趣.
请注意,没有apply


extra credit
I vectorized and embedded your closeCols for some mind numbing fun.
Notice there is no apply

  • numpy 广播,以获取彼此相减的所有列组合.
  • np.abs
  • <= 5
  • sum(-1)我安排了广播,以便说0行,AAA列与所有0行之间的差异将在最后一个维度上显示. sum(-1)中的-1表示对最后一个维度求和.
  • <= 1所有值之间的距离均小于5.因此,我希望这些总和大于1.因此,我们对所有小于或等于1的蒙版进行屏蔽.
  • numpy broadcasting to get all combinations of columns subtracted from each other.
  • np.abs
  • <= 5
  • sum(-1) I arranged the broadcasting such that the difference of say row 0, column AAA with all of row 0 will be laid out across the last dimension. -1 in the sum(-1) says to sum across last dimension.
  • <= 1 all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.
v = df2.values
df2.mask((np.abs(v[:, :, None] - v[:, None]) <= 5).sum(-1) <= 1)

    AAA   BBB  CCC  DDD  EEE
0   NaN   NaN  100   98  103
1   NaN   NaN   50   50   50
2   NaN  30.0   25   25   25
3   7.0   NaN   10   10   10
4   9.0  11.0   10   10   10
5  10.0  10.0   11   11   11

这篇关于修改函数以返回具有指定值的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆