Python/Pandas-ValueError:索引包含重复的条目,无法重塑 [英] Python/Pandas - ValueError: Index contains duplicate entries, cannot reshape

查看:77
本文介绍了Python/Pandas-ValueError:索引包含重复的条目,无法重塑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个称为"bal"的数据框.看起来像这样:

I have a dataframe called 'bal'. It looks like this:

              ano   id   unit period
business_id                         
9564         2012  302  sdasd  anual
9564         2011  303  sdasd  anual
2361         2013  304  sdasd  anual
2361         2012  305  sdasd  anual
...

我正在运行以下代码:

bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)

我的意图是将其变成这样:

My intention is to turn that into something like this:

               ano    id  unit    period  2006  2007  2008  2009  2010  \
 business_id                                                                     

 72           2013   774  sdasd   anual   NaN   NaN   NaN   NaN   NaN   

 72           2012   775  sdasd   anual   NaN   NaN   NaN   NaN   NaN   

 74           2012  1120  sdasd   anual   NaN   NaN   NaN   NaN   NaN   

 119          2013   875  sdasd   anual   NaN   NaN   NaN   NaN   NaN   

 119          2012   876  sdasd   anual   NaN   NaN   NaN   NaN   NaN   

 ...

当我编写该代码时,出现此错误:

When I that code, I get this error:

ValueError: Index contains duplicate entries, cannot reshape

因此,为了避免重复,我添加了一个drop_duplicates行:

So to avoid duplicates, I added a drop_duplicates line:

bal=bal.drop_duplicates()
bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)

运行代码时,我遇到了同样的问题:

When I run the code, voilá, I get the same problem:

ValueError: Index contains duplicate entries, cannot reshape

我是在做错什么还是误解了什么?

Am I doing something wrong or misunderstanding something?

编辑

bal 是我正在使用以下代码从SQL创建的数据框:

bal is a dataframe I'm creating out of a SQL using the following code:

bal=pd.read_sql('select * from table;',connection).set_index('business_id')[['ano','id','unit','period']]

奇怪的是,如果我限制SQL查询,它将很好地工作:

The weird thing is that if I limit the SQL query it works fine:

bal=pd.read_sql('select * from table limit 1000;',connection).set_index('business_id')[['ano','id','unit','period']]

我认为问题可能与索引有很多重复这一事实有关(如您在上面的示例中所见).但是,如果我print(bal.head(4))在这个受限的bal中看起来与上面看到的完全相同,并且索引重复.

I thought that the problem could be related to the fact that the index has a lot of duplication (as you can see in that example above). However if I print(bal.head(4)) in this limited bal it looks exactly the same as the one you can see above, with indexes that repeat.

推荐答案

UPDATE2:

qry = "select distinct business_id,ano,id,unit,period from table where period='anual'"
bal=pd.read_sql(qry, connection, index_col=['business_id'])

假设我们得到以下DF(在ano列中仍然有重复的值):

assume we get the following DF (still with duplicated values in the ano column):

In [167]: bal
Out[167]:
              ano   id   unit period
business_id
9564         2012  302  sdasd  anual
9564         2012  299  sdasd  anual
9564         2011  303  sdasd  anual
2361         2013  304  sdasd  anual
2361         2012  305  sdasd  anual

我们可以这样做:

In [169]: bal.join(bal.pivot_table(index=bal.index, columns='ano',
                                   values='id', aggfunc='first'))
Out[169]:
              ano   id   unit period   2011   2012   2013
business_id
2361         2013  304  sdasd  anual    NaN  305.0  304.0
2361         2012  305  sdasd  anual    NaN  305.0  304.0
9564         2012  302  sdasd  anual  303.0  302.0    NaN
9564         2012  299  sdasd  anual  303.0  302.0    NaN
9564         2011  303  sdasd  anual  303.0  302.0    NaN

更新:

请考虑以下示例DF:

In [161]: bal
Out[161]:
              ano   id   unit period
business_id
9564         2012  302  sdasd  anual
9564         2012  299  sdasd  anual   # i've intentionally added this row with duplicated `ano`
9564         2011  303  sdasd  anual
2361         2013  304  sdasd  anual
2361         2012  305  sdasd  anual

重现您的错误:

In [162]: bal.pivot(columns='ano', values='id')
...
skipped
...
ValueError: Index contains duplicate entries, cannot reshape

旧答案:

这就是你想要的吗?

In [144]: bal.join(bal.pivot(columns='ano', values='id'))
Out[144]:
              ano   id   unit period   2011   2012   2013
business_id
2361         2013  304  sdasd  anual    NaN  305.0  304.0
2361         2012  305  sdasd  anual    NaN  305.0  304.0
9564         2012  302  sdasd  anual  303.0  302.0    NaN
9564         2011  303  sdasd  anual  303.0  302.0    NaN

这篇关于Python/Pandas-ValueError:索引包含重复的条目,无法重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆