pandas - 将列值排列成新的列 [英] Pandas - unstack column values into new columns
问题描述
import pandas as pd
pre>
df = pd.DataFrame [a,g,n1,y1],[a,g,n2,y2],[b,h y3],[b,h,n2,y4]],列= [meta1,meta2,name,data])
>>>> df
meta1 meta2 name data
ag n1 y1
ag n2 y2
bh n1 y3
bh n2 y4
我在
名称
中的新列名称以及data
。
我想生成一个数据框:
df = pd.DataFrame([[a,g,y1,y2],[b,h,y3 ,y4]],columns = [meta1,meta2,n1,n2])
>>> df
meta1 meta2 n1 n2
ag y1 y2
bh y3 y4
名为
meta
的列大约包含大多数数据的15个以上的列,我认为特别适合用于索引。这个想法是,我现在有很多重复/冗余数据存储在meta
中,我想生成更紧凑的数据帧。
我已经找到了一些类似的Q,但是无法确定我需要做什么样的操作:枢轴,重新索引,堆栈或拆包等。
PS - 原始索引值对我的目的不重要。
任何帮助将不胜感激。
问题我认为是相关的:
我认为以下Q与我想要做的相关,我看不到如何应用它,因为我不想产生更多的索引。
如果您将元列分组到列表中,那么可以这样做:
metas = ['meta1','meta2']
new_df = df.set_index(['name'] + metas).unstack('name')
print new_df
data
name n1 n2
meta1 meta2
ag y1 y2
bh y3 y4
你大多数的方式在那里额外的裁缝可以让你休息一下。
print new_df.data.rename_axis([None],axis = 1) .reset_index()
meta1 meta2 n1 n2
0 ag y1 y2
1 bh y3 y4
I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:
import pandas as pd
df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])
>>> df
meta1 meta2 name data
a g n1 y1
a g n2 y2
b h n1 y3
b h n2 y4
where I have the names of the new columns I would like in name
and the respective data in data
.
I would like to produce a dataframe of the form:
df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"])
>>> df
meta1 meta2 n1 n2
a g y1 y2
b h y3 y4
The columns called meta
are around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in meta
at the moment and I would like to produce the more compact dataframe presented.
I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?
PS - the original index values are unimportant for my purposes.
Any help would be much appreciated.
Question I think is related:
I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.
If you group your meta columns into a list then you can do this:
metas = ['meta1', 'meta2']
new_df = df.set_index(['name'] + metas).unstack('name')
print new_df
data
name n1 n2
meta1 meta2
a g y1 y2
b h y3 y4
Which gets you most of the way there. Additional tailoring can get you the rest of the way.
print new_df.data.rename_axis([None], axis=1).reset_index()
meta1 meta2 n1 n2
0 a g y1 y2
1 b h y3 y4
这篇关于 pandas - 将列值排列成新的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!