修改函数以返回具有指定值的数据框 [英] Modify function to return dataframe with specified values
问题描述
参考下面的测试数据和我用来识别彼此变量thresh
内的值的函数.
With reference to the test data below and the function I use to identify values within variable thresh
of each other.
任何人都可以帮我修改它以显示所需的输出吗?
Can anyone please help me modify this to show the desired output I have shown?
测试数据
import pandas as pd
import numpy as np
from itertools import combinations
df2 = pd.DataFrame(
{'AAA' : [4,5,6,7,9,10],
'BBB' : [10,20,30,40,11,10],
'CCC' : [100,50,25,10,10,11],
'DDD' : [98,50,25,10,10,11],
'EEE' : [103,50,25,10,10,11]});
功能:
thresh = 5
def closeCols2(df):
max_value = None
for k1,k2 in combinations(df.keys(),2):
if abs(df[k1] - df[k2]) < thresh:
if max_value is None:
max_value = max(df[k1],df[k2])
else:
max_value = max(max_value, max(df[k1],df[k2]))
return max_value
应用数据之前功能:
AAA BBB CCC DDD EEE
0 4 10 100 98 103
1 5 20 50 50 50
2 6 30 25 25 25
3 7 40 10 10 10
4 9 11 10 10 10
5 10 10 11 11 11
应用后的当前系列输出:
df2.apply(closeCols2, axis=1)
0 103
1 50
2 25
3 10
4 11
5 11
dtype: int64
所需的输出是一个数据框,其中显示了thresh
内的所有值,而对于非阈值内的任何值均显示为nan
Desired output is a dataframe showing all values within thresh
and a nan
for any not within thresh
AAA BBB CCC DDD EEE
0 nan nan 100 98 103
1 nan nan 50 50 50
2 nan 30 25 25 25
3 7 nan 10 10 10
4 9 11 10 10 10
5 10 10 11 11 11
推荐答案
将mask
和sub
与axis=1
df2.mask(df2.sub(df2.apply(closeCols2, 1), 0).abs() > thresh)
AAA BBB CCC DDD EEE
0 NaN NaN 100 98 103
1 NaN NaN 50 50 50
2 NaN 30.0 25 25 25
3 7.0 NaN 10 10 10
4 9.0 11.0 10 10 10
5 10.0 10.0 11 11 11
注释:
我将重新定义closeCols
以包括thresh
作为参数.然后,您可以在apply
调用中传递它.
note:
I'd redefine closeCols
to include thresh
as a parameter. Then you could pass it in the apply
call.
def closeCols2(df, thresh):
max_value = None
for k1,k2 in combinations(df.keys(),2):
if abs(df[k1] - df[k2]) < thresh:
if max_value is None:
max_value = max(df[k1],df[k2])
else:
max_value = max(max_value, max(df[k1],df[k2]))
return max_value
df2.apply(closeCols2, 1, thresh=5)
额外信用
我矢量化了您的closeCols
并将其嵌入,以获得一些令人麻木的乐趣.
请注意,没有apply
extra credit
I vectorized and embedded your closeCols
for some mind numbing fun.
Notice there is no apply
-
numpy
广播,以获取彼此相减的所有列组合. -
np.abs
-
<= 5
-
sum(-1)
我安排了广播,以便说0
行,AAA
列与所有0
行之间的差异将在最后一个维度上显示.sum(-1)
中的-1
表示对最后一个维度求和. -
<= 1
所有值之间的距离均小于5.因此,我希望这些总和大于1.因此,我们对所有小于或等于1的蒙版进行屏蔽.
numpy
broadcasting to get all combinations of columns subtracted from each other.np.abs
<= 5
sum(-1)
I arranged the broadcasting such that the difference of say row0
, columnAAA
with all of row0
will be laid out across the last dimension.-1
in thesum(-1)
says to sum across last dimension.<= 1
all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.
v = df2.values
df2.mask((np.abs(v[:, :, None] - v[:, None]) <= 5).sum(-1) <= 1)
AAA BBB CCC DDD EEE
0 NaN NaN 100 98 103
1 NaN NaN 50 50 50
2 NaN 30.0 25 25 25
3 7.0 NaN 10 10 10
4 9.0 11.0 10 10 10
5 10.0 10.0 11 11 11
这篇关于修改函数以返回具有指定值的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!