在多级数据框中重新索引第二级 [英] Reindex 2nd level in multi-level dataframe

查看:64
本文介绍了在多级数据框中重新索引第二级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要重新索引熊猫数据框的第二级,以便第二级成为每个第一级索引的列表np.arange(N).我尝试遵循,但不幸的是,它只能创建具有相同数量的索引以前存在的行.我想要的是为每个新索引插入新行(带有nan值).

I need to reindex the 2nd level of a pandas dataframe, so that the 2nd level becomes a list np.arange(N) for each 1st level index. I tried to follow this, but unfortunately it only creates an index with as many rows as previously existing. What I want is that for each new index new rows are inserted (with nan values).

In [79]:

df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})
print df
   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6
In [80]:

df['second'] = df.reset_index().groupby(['first']).cumcount()
print df
   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       0      6

我想要的结果是:

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
4    two       2      nan
5  three       0      6
5  three       1      nan
5  three       2      nan

推荐答案

我认为您可以先将列firstsecond设置为多级索引,然后再设置reindex.

I think you can first set columns first and second as multi-level index, and then reindex.

# your data
# ==========================
df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})

df

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6

# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])

df.set_index(['first', 'second']).reindex(multi_index).reset_index()

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5    two       2    NaN
6  three       0    NaN
7  three       1      6
8  three       2    NaN

这篇关于在多级数据框中重新索引第二级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆