比较每组的前两行和后两行,直到最后一条记录 [英] Compare preceding two rows with subsequent two rows of each group till last record

查看:92
本文介绍了比较每组的前两行和后两行,直到最后一条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前有一个问题,该问题已删除,现在修改为不太冗长的形式,以方便您阅读.

I had a question earlier which is deleted and now modified to a less verbose form for you to read easily.

我有一个如下所示的数据框

I have a dataframe as given below

df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})
df['fake_flag'] = ''

我想根据以下规则填充fake_flag列中的值

I would like to fill values in column fake_flag based on the below rules

1)如果前两行是恒定的(ex:5,5)或递减(7,5),则选择两行中的最高行.在这种情况下,(7,5)为7,(5,5)为5

1) if preceding two rows are constant (ex:5,5) or decreasing (7,5), then pick the highest of the two rows. In this case, it is 7 from (7,5) and 5 from (5,5)

2)检查当前行是否比规则1的输出大3个或更多点(> = 3),并在另一行(下一行)重复(两次出现相同的值).可以是8/gt 8(如果规则1输出为5).例如:(n行中的8,n+1行中的8或n行中的10,n+1行中的10)如果是,则在fake_flag column

2) Check whether the current row is greater than the output from rule 1 by 3 or more points (>=3) and it repeats in another (next) row (2 occurrences of same value). It can be 8/gt 8(if rule 1 output is 5). ex: (8 in row n,8 in row n+1 or 10 in row n,10 in row n+1) If yes, then key in fake VAC in the fake_flag column

这是我尝试过的

for i in t1.index:
if i >=2:
    print("current value is  ", t1[i])
    print("preceding 1st (n-1) ", t1[i-1])
    print("preceding 2nd (n-2) ", t1[i-2])
    if (t1[i-1] == t1[i-2] or t1[i-2] >= t1[i-1]): # rule 1 check
        r1_output = t1[i-2] # we get the max of these two values (t1[i-2]), it doesn't matter when it's constant(t1[i-2] or t1[i-1]) will have the same value anyway
        print("rule 1 output is ", r1_output)
        if t1[i] >= r1_output + 3:
            print("found a value for rule 2", t1[i])
            print("check for next value is same as current value", t1[i+1])
            if (t1[i]==t1[i+1]): # rule 2 check
                print("fake flag is being set")
                df['fake_flag'][i] = 'fake_vac'

此检查应针对每个subject_id的所有记录(一个接一个)进行.我有一个包含数百万条记录的数据集.任何有效而优雅的解决方案都是有帮助的.我无法遍历百万条记录.

This check should happen for all records (one by one) for each subject_id. I have a dataset which has million records. Any efficient and elegant solution is helpful. I can't run a loop over million records.

我希望我的输出如下所示

I expect my output to be like as shown below

subject_id = 1

subject_id = 2

推荐答案

import pandas as pd
df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})

df['shift1']=df['PEEP'].shift(1)
df['shift2']=df['PEEP'].shift(2)

df['fake_flag'] = np.where((df['shift1'] ==df['shift2']) | (df['shift1'] < df['shift2']), 'fake VAC', '')
df.drop(['shift1','shift2'],axis=1)

输出

0   1   1   7   
1   1   2   5   
2   1   3   10  fake VAC
3   1   4   10  
4   1   5   11  fake VAC
5   1   6   11  
6   1   7   14  fake VAC
7   1   8   14  
8   1   9   17  fake VAC
9   1   10  17  
10  1   11  21  fake VAC
11  1   12  21  
12  1   13  23  fake VAC
13  1   14  23  
14  1   15  25  fake VAC
15  1   16  25  
16  1   17  22  fake VAC
17  1   18  20  fake VAC
18  1   19  26  fake VAC
19  1   20  26  
20  2   1   5   fake VAC
21  2   2   7   fake VAC
22  2   3   8   
23  2   4   8   
24  2   5   9   fake VAC
25  2   6   9   
26  2   7   13  fake VAC

这篇关于比较每组的前两行和后两行,直到最后一条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆