大 pandas groupby后失踪列 [英] missing column after pandas groupby

查看:133
本文介绍了大 pandas groupby后失踪列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框 df 。我把它分成3列,并计算结果。当我这样做时,我会丢失一些信息,特别是 name 列。该列与 desk_id 列进行1:1映射。无论如何都要在我的最终数据框中包含这两个选项?



这里是数据框:

  shift_id shift_start_time shift_end_time name end_time desk_id shift_hour 
0 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 10:16:41.040000 15557987 2
1 37423064 2014-01 -17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 10:16:41.096000 15557987 2
2 37423064 2014-01-17 08:00:00 2014-01 -17 12:00:00 Adam Scott 2014-01-17 10:52:17.402000 15557987 2
3 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014 -01-17 11:06:59.083000 15557987 3
4 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 08:27:57.998000 15557987 0

我这样分组:

<$ ($'code> groupped = df.groupby(['desk_id','shift_id','shift_hour'])size()
grouped = grouped.reset_index()

这里是结果,缺少名称列。

  desk_id shift_id shift_hour 0 
0 14468690 37729081 0 7
1 14468690 37729081 1 3
2 14468690 37729081 2 6
3 14468690 37729081 3 5
4 14468690 37729082 0 5

另外,无论如何要将count列重命名为'count'而不是'0' ?

解决方案

您需要在<$ c中包含'name'

 在[43]中:

分组= df.groupby(['desk_id','shift_id','shift_hour','name'])。size()
grouped = grouped.reset_index()
grouped.columns = np。其中(grouped.columns == 0,'count',grouped.columns)#取消默认0到'count'
打印分组
desk_id shift_id shift_hour名字数
0 15557987 37423064 0 AdamScott 1
1 15557987 37423064 2 Adam Scott 3
15557987 37423064 3 Adam Scott 1

如果name-to-id关系是多对一的类型,比如说我们有一个pete scott来处理同一组数据,那么结果就会变成:

  desk_id shift_id shift_hour name count 
0 15557987 37423064 0 Adam Scott 1
1 15557987 37423064 0 Pete Scott 1
2 15557987 37423064 2 Adam Scott 3
3 15557987 37423064 2 Pete Scott 3
4 15557987 37423064 3 Adam Scott 1
15557987 37423064 3 Pete Scott 1


I've got a pandas dataframe df. I group it by 3 columns, and count the results. When I do this I lose some information, specifically, the name column. This column is mapped 1:1 with the desk_id column. Is there anyway to include both in my final dataframe?

here is the dataframe:

   shift_id    shift_start_time      shift_end_time        name                   end_time       desk_id  shift_hour
0  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:16:41.040000  15557987           2
1  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:16:41.096000  15557987           2
2  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:52:17.402000  15557987           2
3  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 11:06:59.083000  15557987           3
4  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 08:27:57.998000  15557987           0

I group it like this:

grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour']).size()
grouped = grouped.reset_index()

And here is the result, missing the name column.

    desk_id  shift_id  shift_hour  0
0  14468690  37729081           0  7
1  14468690  37729081           1  3
2  14468690  37729081           2  6
3  14468690  37729081           3  5
4  14468690  37729082           0  5

Also, anyway to rename the count column as 'count' instead of '0'?

解决方案

You need to include 'name' in groupby by groups:

In [43]:

grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour', 'name']).size()
grouped = grouped.reset_index()
grouped.columns=np.where(grouped.columns==0, 'count', grouped.columns) #replace the default 0 to 'count'
print grouped
    desk_id  shift_id  shift_hour        name  count
0  15557987  37423064           0  Adam Scott      1
1  15557987  37423064           2  Adam Scott      3
2  15557987  37423064           3  Adam Scott      1

If the name-to-id relationship is a many-to-one type, say we have a pete scott for the same set of data, the result will become:

    desk_id  shift_id  shift_hour        name  count
0  15557987  37423064           0  Adam Scott      1
1  15557987  37423064           0  Pete Scott      1
2  15557987  37423064           2  Adam Scott      3
3  15557987  37423064           2  Pete Scott      3
4  15557987  37423064           3  Adam Scott      1
5  15557987  37423064           3  Pete Scott      1

这篇关于大 pandas groupby后失踪列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆