过滤后向pandas df添加新列 [英] Add new columns to a pandas df after filtering
问题描述
我有一个df,其中包含有关各个地方的信息.
I have a df that contains information about various places.
import pandas as pd
d = ({
'C' : ['08:00:00','XX','08:10:00','XX','08:41:42','XX','08:50:00','XX', '09:00:00', 'XX','09:15:00','XX','09:21:00','XX','09:30:00','XX','09:40:00','XX'],
'D' : ['Home','','Home','','Away','','Home','','Away','','Home','','Home','','Away','','Home',''],
'E' : ['Num:','','Num:','','Num:','','Num:','','Num:', '','Num:','','Num:','','Num:', '','Num:', ''],
'F' : ['1','','1','','1','','1','','1', '','2','','2','','1', '','2',''],
'A' : ['A','','A','','A','','A','','A','','A','','A','','A','','A',''],
'B' : ['Stop','','Res','','Stop','','Start','','Res','','Stop','','Res','','Start','','Start','']
})
df = pd.DataFrame(data=d)
我想将该数据导出到各自的位置,并在Column D
中标记.我也想基于Column B
中标记的功能添加新列.
I want to export that data into their respective places, which are labelled in Column D
. I also want to add new columns based off functions labelled in Column B
.
df['C'] = pd.to_timedelta(df['C'], errors="coerce").dt.total_seconds()
incl = ['Home', 'Away']
for k, g in df[df.D.isin(incl)].groupby('D'):
Stop = g.loc[df['B'] == 'Stop'].reset_index()['C']
Start = g.loc[df['B'] == 'Start'].reset_index()['C']
Res = g.loc[df['B'] == 'Res'].reset_index()['C']
g['Start_diff'] = Start - Stop
g['Res_diff'] = Start - Res
问题是这些功能多次出现,并在Column F
中标记.因此,如果我们查看Home
的导出,我们将首次在Column F
中获得差异.
The problem is these functions occur multiple times, which are labelled in Column F
. So if we look at the export for Home
we get the diff for the first time in Column F
.
输出:
A B C D E F Start_diff Res_diff
0 A Stop 28800 Home Num: 1 3000 2400
2 A Res 29400 Home Num: 1
6 A Start 31800 Home Num: 1
10 A Stop 33300 Home Num: 2
12 A Res 33660 Home Num: 2
16 A Start 34800 Home Num: 2
我希望预期的输出是:
A B C D E F Start_diff Res_diff
0 A Stop 28800 Home Num: 1 3000 2400
2 A Res 29400 Home Num: 1
6 A Start 31800 Home Num: 1
10 A Stop 33300 Home Num: 2 1500 1200
12 A Res 33660 Home Num: 2
16 A Start 34800 Home Num: 2
我试图将for k, g in df[df.D.isin(incl)].groupby('D'):
更改为for k, g in df[df.D.isin(incl)].groupby('D').F.nunique():
但是我得到一个错误TypeError: 'numpy.int64' object is not iterable
推荐答案
我相信需要使用D
和F
列使用groupby的自定义函数,并用mask
替换重复的值:
I believe need custom function with groupby by D
and F
columns with replace duplicated values by mask
:
def f(g):
Stop = g.loc[df['B'] == 'Stop', 'C']
Start = g.loc[df['B'] == 'Start', 'C']
Res = g.loc[df['B'] == 'Res', 'C']
g['Start_diff'] = Start.values[0] - Stop.values[0]
g['Res_diff'] = Start.values[0] - Res.values[0]
return (g)
df = df[df.D.isin(incl)].groupby(['D', 'F']).apply(f)
df[['Start_diff', 'Res_diff']] = df[['Start_diff', 'Res_diff']].mask(df.duplicated(['D','F']))
print (df)
C D E F A B Start_diff Res_diff
0 28800.0 Home Num: 1 A Stop 3000.0 2400.0
2 29400.0 Home Num: 1 A Res NaN NaN
4 31302.0 Away Num: 1 A Stop 2898.0 1800.0
6 31800.0 Home Num: 1 A Start NaN NaN
8 32400.0 Away Num: 1 A Res NaN NaN
10 33300.0 Home Num: 2 A Stop 1500.0 1140.0
12 33660.0 Home Num: 2 A Res NaN NaN
14 34200.0 Away Num: 1 A Start NaN NaN
16 34800.0 Home Num: 2 A Start NaN NaN
这篇关于过滤后向pandas df添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!