pandas 取消堆叠问题:ValueError:索引包含重复的条目,无法重塑 [英] Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

查看:226
本文介绍了 pandas 取消堆叠问题:ValueError:索引包含重复的条目,无法重塑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试与大熊猫解开多重索引,而我不断得到:

I am trying to unstack a multi-index with pandas and I am keep getting:

ValueError: Index contains duplicate entries, cannot reshape

给出具有四列的数据集:

Given a dataset with four columns:

  • id(字符串)
  • 日期(字符串)
  • 位置(字符串)
  • 值(浮动)

我首先设置一个三级多索引:

I first set a three-level multi-index:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

然后我尝试拆开位置:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

这是怎么回事?

推荐答案

下面是一个示例DataFrame,它显示了这一点,它具有具有相同索引的重复值.问题是,您是要汇总这些数据还是将其保留为多行?

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape


一种解决方案是使用reset_index(然后返回df)并使用pivot_table.


One solution is to reset_index (and get back to df) and use pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

另一种选择(如果您不想汇总)是添加一个虚拟关卡,将其堆叠,然后放下该虚拟关卡...

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

这篇关于 pandas 取消堆叠问题:ValueError:索引包含重复的条目,无法重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆