有两个关键 pandas 群 [英] pandas groupby with two key

查看：104 发布时间：2018/5/30 14:20:13 python pandas group-by aggregate-functions

本文介绍了有两个关键 pandas 群的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我花了整整一个下午的时间试图完成这个任务，但是失败了
，我得到了一个这样的熊猫数据框

  columns = [ka，kb_1，kb_2，timeofEvent，timeInterval] 
 0：'3M''2345''2345''2014-10-5'，3000 
 1：'3M' '2958''2152''2015-3-22'，5000 
 2：'GE''2183''2183''2012-12-31'，515 
 3：'3M''2958 ''2958''2015-3-10'，395 
 4：'GE''2183''2285''2015-4-19'，1925 
 5：'GE''2598'' 2598''2015-3-17'，1915

要实现的是一个新的数据框架在下面分组为ka和kb_1 b'3M'，'2345'，0,0％，1
'3M'，'2958'，1,50％，2
'GE'，'2183'，1,50％ 2
'GE'，'2598'，0,0％，1

错误记录：当kb_1！= kb_2时，对应的记录被视为异常记录）

我的代码是这样的

  df ['isError' ] =（df ['kb_1']！= df ['kb_2']）。astype（'int'）
 grouped2 = df.groupby（['ka'，'kb_1']）
 
 df_rst = pd.DataFrame（）
 df_rst ['ka'] = grouped2 ['ka']。all（）
 df_rst ['kb_1'] = grouped2 ['kb_1']。all （）
 df_rst ['errorNum'] = grouped2 ['isError']。transform（sum）
 df_rst ['totalNum of records'] = grouped2.size（）
 df_rst ['Soll_neq_Letzt_error_rate '] = df_rst ['errorNum']。astype（'float'）。div（df_rst ['totalNum']。astype（'float'），axis ='index'）
 df_rst.to_csv（'rst。 csv'，index = False）

但结果不是我想要的。

例如，列kb_1变为true / false，并且errorNum变为Nan。
任何人都可以解释为什么并给出一个可行的实现？谢谢

解决方案

我不确定你做了什么，但我认为你没那么遥远。

  df2 = df.groupby（['ka'，'kb_1']）['isError']。agg（{'errorNum' ：'sum'，
'recordNum'：'count'}）
 
 df2 ['errorRate'] = df2 ['errorNum'] / df2 ['recordNum'] 
 
 recordNum errorNum errorRate 
 ka kb_1 
 3M 2345 1 0 0.0 
 2958 2 1 0.5 
 GE 2183 2 1 0.5 
 2598 1 0 0.0

I took a whole afternoon trying to implement this task but failed ,I've got a pandas data frame like this

columns=[ka,kb_1,kb_2,timeofEvent,timeInterval]
0:'3M' '2345' '2345' '2014-10-5',3000
1:'3M' '2958' '2152' '2015-3-22',5000
2:'GE' '2183' '2183' '2012-12-31',515
3:'3M' '2958' '2958' '2015-3-10',395
4:'GE' '2183' '2285' '2015-4-19',1925
5:'GE' '2598' '2598' '2015-3-17',1915

What is to be implemented is a new data frame grouped by "ka and kb_1" below

columns=[ka,kb,errorNum,errorRate,totalNum of records]
'3M','2345',0,0%,1
'3M','2958',1,50%,2
'GE','2183',1,50%,2
'GE','2598',0,0%,1

(definition of error Record: when kb_1!=kb_2,the corresponding record is treated as abnormal record)

My code is like this

df['isError'] = (df['kb_1'] != df['kb_2']).astype('int')
grouped2 = df.groupby(['ka', 'kb_1'])

df_rst = pd.DataFrame()
df_rst['ka']  =grouped2['ka'].all()
df_rst['kb_1'] = grouped2['kb_1'].all()
df_rst['errorNum'] = grouped2['isError'].transform(sum)
df_rst['totalNum of records'] = grouped2.size()
df_rst['Soll_neq_Letzt_error_rate'] = df_rst['errorNum'].astype('float').div(df_rst['totalNum'].astype('float'), axis='index')
df_rst.to_csv('rst.csv',index=False)

but the result is not what I wanted.

For instance, the column kb_1 becomes true/false, and errorNum becomes Nan. Can anyone explain why and give an workable implementation? Thanks

解决方案

I'm not sure exactly what you did, but I don't think you were that far off.

df2 = df.groupby(['ka','kb_1'])['isError'].agg({ 'errorNum':  'sum',
                                                 'recordNum': 'count' })

df2['errorRate'] = df2['errorNum'] / df2['recordNum']

         recordNum  errorNum  errorRate
ka kb_1                                
3M 2345          1         0        0.0
   2958          2         1        0.5
GE 2183          2         1        0.5
   2598          1         0        0.0

这篇关于有两个关键 pandas 群的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有两个关键 pandas 群 [英] pandas groupby with two key

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有两个关键 pandas 群 [英] pandas groupby with two key

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭