在python中保留功能 [英] retain function in python

查看:88
本文介绍了在python中保留功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我正在从SAS转换为Python大熊猫.我有一个问题是,熊猫在SAS中是否具有保留类似功能,因此我可以动态引用最后一条记录.在下面的代码中,我必须手动循环遍历每行并引用最后一条记录.与类似的SAS程序相比,它看起来相当慢.无论如何,它使其在熊猫中更有效吗?谢谢.

Recently, I am converting from SAS to Python pandas. One question I have is that does pandas have a retain like function in SAS,so that I can dynamically referencing the last record. In the following code, I have to manually loop through each line and reference the last record. It seems pretty slow compared to the similar SAS program. Is there anyway that makes it more efficient in pandas? Thank you.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 1, 1, 1], 'B': [0, 0, 1, 0]})
df['C'] = np.nan
df['lag_C'] = np.nan
for row in df.index:
    if row == df.head(1).index:
        df.loc[row, 'C'] = (df.loc[row, 'A'] == 0) + 0
    else:
        if (df.loc[row, 'B'] == 1):
            df.loc[row, 'C'] = 1
        elif (df.loc[row, 'lag_C'] == 0):
            df.loc[row, 'C'] = 0
        elif (df.loc[row, 'lag_C'] != 0):
            df.loc[row, 'C'] = df.loc[row, 'lag_C'] + 1
    if row != df.tail(1).index:
        df.loc[row +1, 'lag_C'] = df.loc[row, 'C']

推荐答案

算法非常复杂,但是我尝试使用矢量化方法.
如果我理解的话,可以像此问题中那样使用累积总和. .最后一列lag_C被移位到列C.

Very complicated algorithm, but I try vectorized approach.
If I understand it, there can be use cumulative sum as using in this question. Last column lag_C is shifted column C.

但是我的算法不能在df的第一行中使用,因为只有这些行是从列A的第一个值开始计数的,有时是从列B的第一个值开始计算的.因此,我创建了D列,如果条件为True,则在其中区分行,然后将这些行复制到输出列C.

But my algorithm can't be use in first rows of df, because only these rows are counted from first value of column A and sometimes column B. So I created column D, where are distinguished rows and latter are copy to output column C, if conditions are True.

我更改了输入数据并测试了第一个有问题的行.我尝试用列A的第一行测试列B的前3行的所有三种可能性.

I changed input data and test first problematic rows. I try test all three possibilities of first 3 rows of column B with first row of column A.

我的输入条件是:
AB仅是1O.列Clag_C是仅包含NaN的帮助程序列.

My input condition are:
Column A and B are only 1 or O. Column C and lag_C are helper columns with only NaN.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1,1,1,1,1,0,0,1,1,0,0], 'B': [0,0,1,1,0,0,0,1,0,1,0]})
df1 = pd.DataFrame({'A': [1,1,1,1,1,0,0,1,1,0,0], 'B': [0,0,1,1,0,0,0,1,0,1,0]})

#cumulative sum of column B
df1['C'] = df1['B'].cumsum()
df1['lag_C'] = 1
#first 'group' with min value is problematic, copy to column D for latter use
df1.loc[df1['C'] == df1['C'].min() ,'D'] = df1['B']
#cumulative sums of groups to column C
df1['C']= df1.groupby(['C'])['lag_C'].cumsum()
#correct problematic states in column C, use value from D
if (df1['A'].loc[0] == 1):
    df1.loc[df1['D'].notnull() ,'C'] = df1['D']
if ((df1['A'].loc[0] == 1) & (df1['B'].loc[0] == 1)):
    df1.loc[df1['D'].notnull() ,'C'] = 0
del df1['D']
#shifted column lag_C from column C
df1['lag_C'] = df1['C'].shift(1)
print df1
#    A  B  C  lag_C
#0   1  0  0    NaN
#1   1  0  0      0
#2   1  1  1      0
#3   1  1  1      1
#4   1  0  2      1
#5   0  0  3      2
#6   0  0  4      3
#7   1  1  1      4
#8   1  0  2      1
#9   0  1  1      2
#10  0  0  2      1

这篇关于在python中保留功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆