过滤后向pandas df添加新列 [英] Add new columns to a pandas df after filtering

查看:163
本文介绍了过滤后向pandas df添加新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个df,其中包含有关各个地方的信息.

I have a df that contains information about various places.

import pandas as pd

d = ({
    'C' : ['08:00:00','XX','08:10:00','XX','08:41:42','XX','08:50:00','XX', '09:00:00', 'XX','09:15:00','XX','09:21:00','XX','09:30:00','XX','09:40:00','XX'],
    'D' : ['Home','','Home','','Away','','Home','','Away','','Home','','Home','','Away','','Home',''],
    'E' : ['Num:','','Num:','','Num:','','Num:','','Num:', '','Num:','','Num:','','Num:', '','Num:', ''],
    'F' : ['1','','1','','1','','1','','1', '','2','','2','','1', '','2',''],   
    'A' : ['A','','A','','A','','A','','A','','A','','A','','A','','A',''],           
    'B' : ['Stop','','Res','','Stop','','Start','','Res','','Stop','','Res','','Start','','Start','']
    })

df = pd.DataFrame(data=d)

我想将该数据导出到各自的位置,并在Column D中标记.我也想基于Column B中标记的功能添加新列.

I want to export that data into their respective places, which are labelled in Column D. I also want to add new columns based off functions labelled in Column B.

df['C'] = pd.to_timedelta(df['C'], errors="coerce").dt.total_seconds()

incl = ['Home', 'Away']    

for k, g in df[df.D.isin(incl)].groupby('D'):
    Stop = g.loc[df['B'] == 'Stop'].reset_index()['C']
    Start = g.loc[df['B'] == 'Start'].reset_index()['C']
    Res = g.loc[df['B'] == 'Res'].reset_index()['C']

    g['Start_diff'] = Start - Stop
    g['Res_diff'] = Start - Res

问题是这些功能多次出现,并在Column F中标记.因此,如果我们查看Home的导出,我们将首次在Column F中获得差异.

The problem is these functions occur multiple times, which are labelled in Column F. So if we look at the export for Home we get the diff for the first time in Column F.

输出:

    A   B       C       D       E       F   Start_diff  Res_diff
0   A   Stop    28800   Home    Num:    1   3000        2400
2   A   Res     29400   Home    Num:    1       
6   A   Start   31800   Home    Num:    1       
10  A   Stop    33300   Home    Num:    2       
12  A   Res     33660   Home    Num:    2       
16  A   Start   34800   Home    Num:    2       

我希望预期的输出是:

    A   B       C       D       E       F   Start_diff  Res_diff
0   A   Stop    28800   Home    Num:    1   3000        2400
2   A   Res     29400   Home    Num:    1       
6   A   Start   31800   Home    Num:    1       
10  A   Stop    33300   Home    Num:    2   1500        1200    
12  A   Res     33660   Home    Num:    2       
16  A   Start   34800   Home    Num:    2       

我试图将for k, g in df[df.D.isin(incl)].groupby('D'):更改为for k, g in df[df.D.isin(incl)].groupby('D').F.nunique():

但是我得到一个错误TypeError: 'numpy.int64' object is not iterable

推荐答案

我相信需要使用DF列使用groupby的自定义函数,并用mask替换重复的值:

I believe need custom function with groupby by D and F columns with replace duplicated values by mask:

def f(g):
    Stop = g.loc[df['B'] == 'Stop', 'C']
    Start = g.loc[df['B'] == 'Start', 'C']
    Res = g.loc[df['B'] == 'Res', 'C']
    g['Start_diff'] = Start.values[0] - Stop.values[0]
    g['Res_diff'] = Start.values[0] - Res.values[0]

    return (g)

df = df[df.D.isin(incl)].groupby(['D', 'F']).apply(f)

df[['Start_diff', 'Res_diff']] = df[['Start_diff', 'Res_diff']].mask(df.duplicated(['D','F']))
print (df)
          C     D     E  F  A      B  Start_diff  Res_diff
0   28800.0  Home  Num:  1  A   Stop      3000.0    2400.0
2   29400.0  Home  Num:  1  A    Res         NaN       NaN
4   31302.0  Away  Num:  1  A   Stop      2898.0    1800.0
6   31800.0  Home  Num:  1  A  Start         NaN       NaN
8   32400.0  Away  Num:  1  A    Res         NaN       NaN
10  33300.0  Home  Num:  2  A   Stop      1500.0    1140.0
12  33660.0  Home  Num:  2  A    Res         NaN       NaN
14  34200.0  Away  Num:  1  A  Start         NaN       NaN
16  34800.0  Home  Num:  2  A  Start         NaN       NaN

这篇关于过滤后向pandas df添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆