转置多列Pandas数据框 [英] transpose multiple columns Pandas dataframe
问题描述
我正在尝试重塑数据框,但无法获得所需的结果. 数据框如下所示:
I'm trying to reshape a dataframe, but I'm not able to get the results I need. The dataframe looks like this:
m r s p O W N
1 4 3 1 2.81 3.70 3.03
1 4 4 1 2.14 2.82 2.31
1 4 5 1 1.47 1.94 1.59
1 4 3 2 0.58 0.78 0.60
1 4 4 2 0.67 0.00 0.00
1 4 5 2 1.03 2.45 1.68
1 4 3 3 1.98 1.34 1.81
1 4 4 3 0.00 0.04 0.15
1 4 5 3 0.01 0.00 0.26
我需要重塑数据框,使其看起来像这样:
I need to reshape the dataframe so it will look like this:
m r s p O W N p O W N p O W N
1 4 3 1 2.81 3.70 3.03 2 0.58 0.78 0.60 3 1.98 1.34 1.81
1 4 4 1 2.14 2.82 2.31 2 0.67 0.00 0.00 3 0.00 0.04 0.15
1 4 5 1 1.47 1.94 1.59 2 1.03 2.45 1.68 3 0.01 0.00 0.26
我尝试使用pivot_table
函数
df.pivot_table(index=['m','r','s'], columns=['p'], values=['O','W','N'])
但是我不能完全得到我想要的东西.有人知道该怎么做吗?
but I'm not able to get quite what I want. Does anyone know how to do this?
推荐答案
作为一个自称对熊猫很方便的人,pivot_table
和melt
函数使我感到困惑.我更喜欢坚持使用定义明确且唯一的索引,并使用数据框本身的stack
和unstack
方法.
As someone who fancies himself as pretty handy with pandas, the pivot_table
and melt
functions are confusing to me. I prefer to stick with a well-defined and unique index and use the stack
and unstack
methods of the dataframe itself.
首先,我问您是否真的需要像这样重复p列?在显示数据时,我可以看到它的价值,但是IMO熊猫并不是真的可以那样工作.我们可以试一下,但是让我们看看是否有更简单的解决方案可以满足您的需求.
First, I'll ask if you really need to repeat the p-column like that? I can sort of see its value when presenting data, but IMO pandas isn't really set up to work like that. We could shoehorn it in, but let's see if a simpler solution gets you what you need.
这就是我要做的:
from io import StringIO
import pandas
datatable = StringIO("""\
m r s p O W N
1 4 3 1 2.81 3.70 3.03
1 4 4 1 2.14 2.82 2.31
1 4 5 1 1.47 1.94 1.59
1 4 3 2 0.58 0.78 0.60
1 4 4 2 0.67 0.00 0.00
1 4 5 2 1.03 2.45 1.68
1 4 3 3 1.98 1.34 1.81
1 4 4 3 0.00 0.04 0.15
1 4 5 3 0.01 0.00 0.26""")
df = (
pandas.read_table(datatable, sep='\s+')
.set_index(['m', 'r', 's', 'p'])
.unstack(level='p')
)
df.columns = df.columns.swaplevel(0, 1)
df.sort(axis=1, inplace=True)
print(df)
哪些印刷品:
p 1 2 3
O W N O W N O W N
m r s
1 4 3 2.81 3.70 3.03 0.58 0.78 0.60 1.98 1.34 1.81
4 2.14 2.82 2.31 0.67 0.00 0.00 0.00 0.04 0.15
5 1.47 1.94 1.59 1.03 2.45 1.68 0.01 0.00 0.26
所以现在列是一个MultiIndex,您可以访问例如p = 2
和df[2]
或df.xs(2, level='p', axis=1)
的所有值,这给了我:
So now the columns are a MultiIndex and you can access, for example, all of the values where p = 2
with df[2]
or df.xs(2, level='p', axis=1)
, which gives me:
O W N
m r s
1 4 3 0.58 0.78 0.60
4 0.67 0.00 0.00
5 1.03 2.45 1.68
同样,您可以通过以下方式获取所有W
列:df.xs('W', level=1, axis=1)
(我们说level=1
),因为该列级别没有名称,所以我们改用它的位置)
Similarly, you can get all of the W
columns with: df.xs('W', level=1, axis=1)
(we say level=1
) because that column level does not have a name, so we use its position instead)
p 1 2 3
m r s
1 4 3 3.70 0.78 1.34
4 2.82 0.00 0.04
5 1.94 2.45 0.00
您可以使用axis=0
类似地查询列.
You can similarly query the columns by using axis=0
.
如果您真的 需要一列中的p
值,只需在其中手动添加它并为列重新索引:
If you really need the p
values in a column, just add it there manually and reindex your columns:
for p in df.columns.get_level_values('p').unique():
df[p, 'p'] = p
cols = pandas.MultiIndex.from_product([[1,2,3], list('pOWN')])
df = df.reindex(columns=cols)
print(df)
1 2 3
p O W N p O W N p O W N
m r s
1 4 3 1 2.81 3.70 3.03 2 0.58 0.78 0.60 3 1.98 1.34 1.81
4 1 2.14 2.82 2.31 2 0.67 0.00 0.00 3 0.00 0.04 0.15
5 1 1.47 1.94 1.59 2 1.03 2.45 1.68 3 0.01 0.00 0.26
这篇关于转置多列Pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!