重置后Python Pandas运行总计 [英] Python Pandas Running Totals with Resets

查看:101
本文介绍了重置后Python Pandas运行总计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想执行以下任务.给定2列(好和坏),我想用运行总计替换两列的任何行.这是当前数据帧以及所需数据帧的示例.

I would like to perform the following task. Given a 2 columns (good and bad) I would like to replace any rows for the two columns with a running total. Here is an example of the current dataframe along with the desired data frame.

我应该添加我的意图.我正在尝试使用连续变量作为输入来创建均等合并(在这种情况下为20)变量.我知道可以使用pandas cut和qcut函数,但是对于好/坏率,返回的结果将为零(需要计算证据和信息价值的权重).分子或分母中的零将不允许数学计算.

I should have added what my intentions are. I am trying to create equally binned (in this case 20) variable using a continuous variable as the input. I know the pandas cut and qcut functions are available, however the returned results will have zeros for the good/bad rate (needed to compute the weight of evidence and information value). Zeros in either the numerator or denominator will not allow the mathematical calculations to work.

   d={'AAA':range(0,20),
      'good':[3,3,13,20,28,32,59,72,64,52,38,24,17,19,12,5,7,6,2,0],
      'bad':[0,0,1,1,1,0,6,8,10,6,6,10,5,8,2,2,1,3,1,1]}
   df=pd.DataFrame(data=d)
   print(df)

以下是我对上述数据框需要做的解释.

Here is an explanation of what I need to do to the above dataframe.

粗略地说,每当我在任一列中遇到零时,我都需要为该列使用一个连续的总计,而不是对包含零的列的下一行具有非零值的下一行.

Roughly speaking, anytime I encounter a zero for either column, I need to use a running total for the column which is not zero to the next row which has a non-zero value for the column that contained zeros.

这是所需的输出:

dd={'AAA':range(0,16),
    'good':[19,20,60,59,72,64,52,38,24,17,19,12,5,7,6,2],
    'bad':[1,1,1,6,8,10,6,6,10,5,8,2,2,1,3,2]}

desired_df=pd.DataFrame(data=dd)    
print(desired_df) 

推荐答案

P.Tillmann.感谢您的协助.对于更高级的读者,我认为您像我一样会感到震惊.我很乐意接受任何建议,以使此建议更加精简.

P.Tillmann. I appreciate your assistance with this. For the more advanced readers I would assume you to find this code appalling, as I do. I would be more than happy to take any recommendation which makes this more streamlined.

d={'AAA':range(0,20),
  'good':[3,3,13,20,28,32,59,72,64,52,38,24,17,19,12,5,7,6,2,0],
  'bad':[0,0,1,1,1,0,6,8,10,6,6,10,5,8,2,2,1,3,1,1]}
df=pd.DataFrame(data=d)
print(df)

row_good=0
row_bad=0
row_bad_zero_count=0
row_good_zero_count=0
row_out='NO'
crappy_fix=pd.DataFrame()
for index,row in df.iterrows():
    if row['good']==0 or row['bad']==0:
        row_bad += row['bad']
        row_good += row['good']
        row_bad_zero_count += 1
        row_good_zero_count += 1
        output_ind='1'
        row_out='NO'
    elif index+1 < len(df) and (df.loc[index+1,'good']==0 or    df.loc[index+1,'bad']==0):
        row_bad=row['bad']
        row_good=row['good'] 
        output_ind='2'
        row_out='NO'    
    elif (row_bad_zero_count > 1 or row_good_zero_count > 1) and row['good']!=0 and row['bad']!=0:
        row_bad += row['bad']
        row_good += row['good']
        row_bad_zero_count=0
        row_good_zero_count=0    
        row_out='YES'
        output_ind='3'
    else:
        row_bad=row['bad']
        row_good=row['good']
        row_bad_zero_count=0
        row_good_zero_count=0
        row_out='YES'
        output_ind='4'

    if ((row['good']==0 or row['bad']==0) 
        and (index > 0 and (df.loc[index-1,'good']!=0 or df.loc[index-1,'bad']!=0))
        and row_good != 0 and row_bad != 0):
        row_out='YES'

    if row_out=='YES':
        temp_dict={'AAA':row['AAA'],
                   'good':row_good,
                   'bad':row_bad}
        crappy_fix=crappy_fix.append([temp_dict],ignore_index=True)
        print(str(row['AAA']),'-',
              str(row['good']),'-',
              str(row['bad']),'-',
              str(row_good),'-',
              str(row_bad),'-',
              str(row_good_zero_count),'-',
              str(row_bad_zero_count),'-',
              row_out,'-',
              output_ind) 

print(crappy_fix)

这篇关于重置后Python Pandas运行总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆