拆分没有多索引的 pandas 系列 [英] split a Pandas series without a multiindex
问题描述
我想采用一个具有单级索引的Pandas系列,并将该索引拆分为具有多列的数据框.例如,输入:
I would like to take a Pandas Series with a single-level index and split on that index into a dataframe with multiple columns. For instance, for input:
s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])
s
a 10
a 11
b 12
b 13
c 14
c 15
c 16
dtype: int64
我想要的输出是:
a b c
0 10 12 14
1 11 13 15
2 NaN NaN 16
我不能直接使用unstack命令,因为它需要一个多索引,而我只有一个单级索引.我尝试放入一个都具有相同值的虚拟索引,但出现错误"ReshapeError:索引包含重复的条目,无法重塑".
I cannot directly use the unstack command because it requires a multiindex and I only have a single-level index. I tried putting in a dummy index that all had the same value, but I got an error "ReshapeError: Index contains duplicate entries, cannot reshape".
我知道这有点不寻常,因为1)大熊猫不喜欢粗糙的数组,因此需要填充,2)需要任意重置索引,3)我不能真正初始化" ",直到我知道最长的列将是多长为止.但这似乎仍然是我应该能够做的事情.我也考虑过通过groupby进行操作,但是似乎没有任何类型的聚合函数之类的grouped_df.values()之类的东西-可能是出于上述原因.
I know that this is a little bit unusual because 1) pandas doesn't like ragged arrays, so there will need to be padding, 2) the index needs to be arbitrarily reset, 3) I can't really "initialize" the dataframe until I know how long the longest column is going to be. But this still seems like something that I should be able to do somehow. I also thought about doing it via groupby, but it doesn't seem like there is anything like grouped_df.values() without any kind of aggregating function- probably for the above reasons.
推荐答案
您可以使用groupby
,apply
,reset_index
创建多索引系列,然后调用unstack
:
You can use groupby
, apply
, reset_index
to create a multiindex Series, and then call unstack
:
import pandas as pd
s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c'])
df = s.groupby(level=0).apply(pd.Series.reset_index, drop=True).unstack(0)
print df
输出:
a b c
0 10 12 14
1 11 13 15
2 NaN NaN 16
这篇关于拆分没有多索引的 pandas 系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!