从行到列重塑 pandas 数据框 [英] Reshape pandas dataframe from rows to columns

查看:59
本文介绍了从行到列重塑 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试重塑我的数据.乍一看,听起来像是移调,但不是.我尝试过融化,堆叠/拆栈,联接等.

I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc.

用例

我希望每个唯一的个人只有一行,并将所有工作历史记录放在各列上.对于客户而言,跨行读取信息要比逐列读取更容易.

I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns.

以下是数据:

import pandas as pd
import numpy as np

data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"],
        'Job': ["Analyst","Manager","Director","Analyst","Manager"],
        'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]}
df2 = pd.DataFrame(data1, columns=['Name', 'Job', 'Job Eff Date'])

df2

这就是我想要的样子: 所需的输出表

Here's what I want it to look like: Desired Output Table

推荐答案

.Tgroupby

def tgrp(df):
    df = df.drop('Name', axis=1)
    return df.reset_index(drop=True).T

df2.groupby('Name').apply(tgrp).unstack()

groupby返回一个对象,该对象包含有关如何将原始系列或数据框进行分组的信息.代替执行groupby以及随后的某种动作,我们可以先将df2.groupby('Name')分配给变量(我经常这样做),例如gb.

groupby returns an object that contains information on how the original series or dataframe has been grouped. Instead of performing a groupby with a subsquent action of some sort, we could first assign the df2.groupby('Name') to a variable (I often do), say gb.

gb = df2.groupby('Name')

在此对象gb上,我们可以调用.mean()以获得每个组的平均值.或.last()获取每个组的最后一个元素(行).或.transform(lambda x: (x - x.mean()) / x.std())在每个组中获得zscore转换.当您要在没有预定义功能的组中执行某些操作时,仍然会出现.apply().

On this object gb we could call .mean() to get an average of each group. Or .last() to get the last element (row) of each group. Or .transform(lambda x: (x - x.mean()) / x.std()) to get a zscore transformation within each group. When there is something you want to do within a group that doesn't have a predefined function, there is still .apply().

.apply()dataframe对象的不同.对于数据框,.apply()将可调用对象作为其参数,并将该可调用对象应用于对象中的每一列(或行).传递给该可调用对象的对象是pd.Series.在dataframe上下文中使用.apply时,记住这一事实将很有帮助.在groupby对象的上下文中,传递给callable参数的对象是一个数据框.实际上,该数据框是groupby指定的组之一.

.apply() for a groupby object is different than it is for a dataframe. For a dataframe, .apply() takes callable object as its argument and applies that callable to each column (or row) in the object. the object that is passed to that callable is a pd.Series. When you are using .apply in a dataframe context, it is helpful to keep this fact in mind. In the context of a groupby object, the object passed to the callable argument is a dataframe. In fact, that dataframe is one of the groups specified by the groupby.

当我编写传递给groupby.apply的函数时,通常将参数定义为df以反映它是一个数据帧.

When I write such functions to pass to groupby.apply, I typically define the parameter as df to reflect that it is a dataframe.

好,所以我们有:

df2.groupby('Name').apply(tgrp)

这将为每个'Name'生成一个子数据帧,并将该子数据帧传递给函数tgrp.然后groupby对象将通过tgrp函数的所有这些组重新组合在一起.

This generates a sub-dataframe for each 'Name' and passes that sub-dataframe to the function tgrp. Then the groupby object recombines all such groups having gone through the tgrp function back together again.

它看起来像这样.

我接受了OP的最初尝试,只是简单地转入了内心.但是我必须先做一些事情.我只是做了:

I took the OP's original attempt to simply transpose to heart. But I had to do some things first. Had I simply done:

df2[df2.Name == 'Jane'].T

df2[df2.Name == 'Joe'].T

手动组合(不使用groupby):

pd.concat([df2[df2.Name == 'Jane'].T, df2[df2.Name == 'Joe'].T])

哇!现在很丑.显然,[0, 1, 2]的索引值不与[3, 4]啮合.因此,让我们重置.

Whoa! Now that's ugly. Obviously the index values of [0, 1, 2] don't mesh with [3, 4]. So let's reset.

pd.concat([df2[df2.Name == 'Jane'].reset_index(drop=True).T,
           df2[df2.Name == 'Joe'].reset_index(drop=True).T])

那好多了.但是现在我们进入了要处理的领域groupby.因此,让它处理它.

That's much better. But now we are getting into the territory groupby was intended to handle. So let it handle it.

返回

df2.groupby('Name').apply(tgrp)

这里唯一缺少的是我们要对结果进行堆叠以获得所需的输出.

The only thing missing here is that we want to unstack the results to get the desired output.

这篇关于从行到列重塑 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆