从行到列重塑 pandas 数据框 [英] Reshape pandas dataframe from rows to columns
问题描述
我正在尝试重塑我的数据.乍一看,听起来像是移调,但不是.我尝试过融化,堆叠/拆栈,联接等.
I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc.
用例
我希望每个唯一的个人只有一行,并将所有工作历史记录放在各列上.对于客户而言,跨行读取信息要比逐列读取更容易.
I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns.
以下是数据:
import pandas as pd
import numpy as np
data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"],
'Job': ["Analyst","Manager","Director","Analyst","Manager"],
'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]}
df2 = pd.DataFrame(data1, columns=['Name', 'Job', 'Job Eff Date'])
df2
这就是我想要的样子: 所需的输出表
Here's what I want it to look like: Desired Output Table
推荐答案
.T
在groupby
def tgrp(df):
df = df.drop('Name', axis=1)
return df.reset_index(drop=True).T
df2.groupby('Name').apply(tgrp).unstack()
groupby
返回一个对象,该对象包含有关如何将原始系列或数据框进行分组的信息.代替执行groupby
以及随后的某种动作,我们可以先将df2.groupby('Name')
分配给变量(我经常这样做),例如gb
.
groupby
returns an object that contains information on how the original series or dataframe has been grouped. Instead of performing a groupby
with a subsquent action of some sort, we could first assign the df2.groupby('Name')
to a variable (I often do), say gb
.
gb = df2.groupby('Name')
在此对象gb
上,我们可以调用.mean()
以获得每个组的平均值.或.last()
获取每个组的最后一个元素(行).或.transform(lambda x: (x - x.mean()) / x.std())
在每个组中获得zscore转换.当您要在没有预定义功能的组中执行某些操作时,仍然会出现.apply()
.
On this object gb
we could call .mean()
to get an average of each group. Or .last()
to get the last element (row) of each group. Or .transform(lambda x: (x - x.mean()) / x.std())
to get a zscore transformation within each group. When there is something you want to do within a group that doesn't have a predefined function, there is still .apply()
.
.apply()
与dataframe
对象的不同.对于数据框,.apply()
将可调用对象作为其参数,并将该可调用对象应用于对象中的每一列(或行).传递给该可调用对象的对象是pd.Series
.在dataframe
上下文中使用.apply
时,记住这一事实将很有帮助.在groupby
对象的上下文中,传递给callable参数的对象是一个数据框.实际上,该数据框是groupby
指定的组之一.
.apply()
for a groupby
object is different than it is for a dataframe
. For a dataframe, .apply()
takes callable object as its argument and applies that callable to each column (or row) in the object. the object that is passed to that callable is a pd.Series
. When you are using .apply
in a dataframe
context, it is helpful to keep this fact in mind. In the context of a groupby
object, the object passed to the callable argument is a dataframe. In fact, that dataframe is one of the groups specified by the groupby
.
当我编写传递给groupby.apply
的函数时,通常将参数定义为df
以反映它是一个数据帧.
When I write such functions to pass to groupby.apply
, I typically define the parameter as df
to reflect that it is a dataframe.
好,所以我们有:
df2.groupby('Name').apply(tgrp)
这将为每个'Name'
生成一个子数据帧,并将该子数据帧传递给函数tgrp
.然后groupby
对象将通过tgrp
函数的所有这些组重新组合在一起.
This generates a sub-dataframe for each 'Name'
and passes that sub-dataframe to the function tgrp
. Then the groupby
object recombines all such groups having gone through the tgrp
function back together again.
它看起来像这样.
我接受了OP的最初尝试,只是简单地转入了内心.但是我必须先做一些事情.我只是做了:
I took the OP's original attempt to simply transpose to heart. But I had to do some things first. Had I simply done:
df2[df2.Name == 'Jane'].T
df2[df2.Name == 'Joe'].T
手动组合(不使用groupby
):
pd.concat([df2[df2.Name == 'Jane'].T, df2[df2.Name == 'Joe'].T])
哇!现在很丑.显然,[0, 1, 2]
的索引值不与[3, 4]
啮合.因此,让我们重置.
Whoa! Now that's ugly. Obviously the index values of [0, 1, 2]
don't mesh with [3, 4]
. So let's reset.
pd.concat([df2[df2.Name == 'Jane'].reset_index(drop=True).T,
df2[df2.Name == 'Joe'].reset_index(drop=True).T])
那好多了.但是现在我们进入了要处理的领域groupby
.因此,让它处理它.
That's much better. But now we are getting into the territory groupby
was intended to handle. So let it handle it.
返回
df2.groupby('Name').apply(tgrp)
这里唯一缺少的是我们要对结果进行堆叠以获得所需的输出.
The only thing missing here is that we want to unstack the results to get the desired output.
这篇关于从行到列重塑 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!