如何在Python中使用嵌套的if和loops对代码进行矢量化处理? [英] How to vectorize code with nested if and loops in Python?

查看:159
本文介绍了如何在Python中使用嵌套的if和loops对代码进行矢量化处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a dataframe like given below

df = pd.DataFrame({
    'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],
    'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
    'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]
})
df['fake_flag'] = ''

在此操作中,我正在执行以下代码所示的操作.这段代码可以正常工作并产生预期的输出,但是我无法将这种方法用于真实的数据集,因为它具有超过一百万条记录.

In this operation, I am performing an operation as shown below in code. This code works fine and produces expected output but I can't use this approach for a real dataset as it has more than million records.

t1 = df['PEEP']
for i in t1.index:
   if i >=2:
      print("current value is  ", t1[i])
      print("preceding 1st (n-1) ", t1[i-1])
      print("preceding 2nd (n-2) ", t1[i-2])
         if (t1[i-1] == t1[i-2] or t1[i-2] >= t1[i-1]):
            r1_output = t1[i-2] # we get the max of these two values (t1[i-2]), it doesn't matter when it's constant(t1[i-2] or t1[i-1]) will have the same value anyway
            print("rule 1 output is ", r1_output)
            if t1[i] >= r1_output + 3:
                print("found a value for rule 2", t1[i])
                print("check for next value is same as current value", t1[i+1])
                if (t1[i]==t1[i+1]):
                    print("fake flag is being set")
                    df['fake_flag'][i] = 'fake_vac'

但是,我不能将其应用于真实数据,因为它有超过一百万条记录.我正在学习Python,您能帮助我了解如何在Python中向量化我的代码吗?

However, I can't apply this to real data as it has more than million records. I am learning Python and can you help me understand how to vectorize my code in Python?

您可以参考此帖子

You can refer this post related post to understand the logic. As I have got the logic right, I have created this post mainly to seek help in vectorizing and fastening my code

我希望我的输出如下所示

I expect my output to be like as shown below

subject_id = 1

subject_id = 2

是否有任何有效且优雅的方法来固定我的代码操作以处理一百万条记录数据集

Is there any efficient and elegant way to fasten my code operation for a million records dataset

推荐答案

不知道背后的故事是什么,但是您当然可以将三个if独立地矢量化并将它们组合在一起,

Not sure what's the story behind this, but you can certainly vectorize three if independently and combine them together,

con1 = t1.shift(2).ge(t1.shift(1))
con2 = t1.ge(t1.shift(2).add(3))
con3 = t1.eq(t1.shift(-1))

df['fake_flag']=np.where(con1 & con2 & con3,'fake VAC','')

编辑(Groupby SubjectID)

con = lambda x: (x.shift(2).ge(x.shift(1))) & (x.ge(x.shift(2).add(3))) & (x.eq(x.shift(-1)))

df['fake_flag'] = df.groupby('subject_id')['PEEP'].transform(con).map({True:'fake VAC',False:''})

这篇关于如何在Python中使用嵌套的if和loops对代码进行矢量化处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆