在大 pandas 中,我如何弄平一组行 [英] In pandas, how do I flatten a group of rows
问题描述
我是python中的熊猫新手,对此我将不胜感激.我一直在谷歌搜索,但似乎无法破解它.
I am new to pandas in python and I would be grateful for any help on this. I have been googling and googling but can't seem to crack it.
例如,我有一个包含6列的csv文件.我正在尝试将行分组在一起,以便将每一行的所有数据展平为一行.
For example, I have a csv file with 6 columns. I am trying to group together the rows so that all the data for each row is flattened into one row.
因此,如果我的数据如下所示:
event event_date event_time名称高度年龄1 2015-05-06 14:00 J Bloggs 185 241 2015-05-06 14:00 P史密斯176 551 2015-05-06 14:00 T柯克193 222 2015-05-14 17:00 B登机口178 722 2015-05-14 17:00 J Mayer 184 42
So if my data looks like this:
event event_date event_time name height age
1 2015-05-06 14:00 J Bloggs 185 24
1 2015-05-06 14:00 P Smith 176 55
1 2015-05-06 14:00 T Kirk 193 22
2 2015-05-14 17:00 B Gates 178 72
2 2015-05-14 17:00 J Mayer 184 42
我想要的最终结果是像这样变平了
and what I want to end up with it flattened like this
event event_date event_time name_1 height_1 age_1 name_2 height_2 age_2 name_3 height_3 age_3
1 2015-05-06 14:00 J Bloggs 185 24 P Smith 176 55 T Kirk 193 22
2 2015-05-14 17:00 B Gates 178 72 J Mayer 184 42
.
因此,如您所见,前3行中的第一个事件已被平整为一个,并且列已扩展以容纳行数据.第二个事件已经变平,列中填充了数据.
任何帮助将不胜感激.
So as you can see above the first event in the first 3 rows have been flattened into one and the columns expanded to accomodate the row data. The second event has been flattened and the columns filled with the data.
Any help would be appreicated.
推荐答案
步骤:
Steps:
1)计算 Groupby 对象的累积计数.加1,以使标题按照所需的 DF
进行格式化.
1) Compute the cumulative counts for the Groupby object. Add 1 so that the headers are formatted as per the desired DF
.
2)设置与索引轴相同的分组列以及计算出的 cumcounts
,然后 unstack
.此外,根据最低级别对标题进行排序.
2) Set the same grouped columns as the index axis along with the computed cumcounts
and then unstack
it. Additionally, sort the header according to the lowermost level.
3)重命名多索引列,并相应地展平以获得单个标头.
3) Rename the multi-index columns and flatten accordingly to obtain a single header.
cc = df.groupby(['event','event_date','event_time']).cumcount() + 1
df = df.set_index(['event','event_date','event_time', cc]).unstack().sort_index(1, level=1)
df.columns = ['_'.join(map(str,i)) for i in df.columns]
df.reset_index()
这篇关于在大 pandas 中,我如何弄平一组行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!