如何从分组数据创建数据框 [英] How to create a dataframe from grouped data

查看:64
本文介绍了如何从分组数据创建数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要分组的数据框(我们称其为"csv"),并获得该组的第一个元素的值.示例:

I have a data frame (let's call it "csv") that I want to group and get a value of the first element of the group. Example:

A   B   C  D
foo bar happy yellow
foo bar sad   green
foo ape last  laugh

我希望将其作为输出:

A   B   C
foo bar happy
foo ape last

我目前正在这样做:

grp1 = csv.groupby(['A','B'])
lst = [(A,B,csv.ix[group[0]]['C']) for (A,B),group in grp1.groups.items()]
df = DataFrame(lst,columns=['A','B','C'])
df.to_csv('grp.csv',cols=['A','B','C'],index=False)

但这似乎效率很低.我真的必须首先创建一个列表,然后从中创建一个dataframe吗?是否没有办法直接创建dataframe或对原始dataframe进行某种索引或其他操作,以便我可以处理每个组中的第一条记录?

But this seems inefficient. Do I really have to create a list first, and then create a dataframe from that? Isn't there a way to just create a dataframe directly, or do some sort of indexing or something on the original dataframe so that i can just work with the first record in each group?

推荐答案

您可以使用aggregate定义聚合函数,该函数将只保留列的第一个元素,并删除其他元素.

You can use aggregate to define your aggregate function, which will just keep the first element of a column and drop the others.

    In [60]: grp = df.groupby(['A', 'B'])

    In [61]: grp.aggregate({'C': lambda c: c.ix[c.first_valid_index()]})
    Out[61]:
                 C
    A   B  
    foo ape   last
        bar  happy

这篇关于如何从分组数据创建数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆