"ValueError:无法从重复的轴重新索引" [英] "ValueError: cannot reindex from a duplicate axis"

查看:99
本文介绍了"ValueError:无法从重复的轴重新索引"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下df:

Timestamp                            A      B      C     ...     
2014-11-09 00:00:00                     NaN     1      NaN   NaN      
2014-11-09 00:00:00                      2     NaN     NaN   NaN             
2014-11-09 00:00:00                     NaN    NaN     3     NaN   
2014-11-09 08:24:00                     NaN    NaN     1     NaN         
2014-11-09 08:24:00                     105    NaN     NaN   NaN           
2014-11-09 09:19:00                     NaN    NaN     23    NaN          

我想做以下事情:

Timestamp                            A      B      C     ...     
2014-11-09 00:00:00                  2      1      3     NaN      
2014-11-09 00:01:00                  NaN    NaN    NaN   NaN
2014-11-09 00:02:00                  NaN    NaN    NaN   NaN
...                                  NaN    NaN    NaN   NaN
2014-11-09 08:23:00                  NaN    NaN    NaN   NaN
2014-11-09 08:24:00                  105    NaN     1    NaN         
2014-11-09 08:25:00                  NaN    NaN     NaN  NaN     
2014-11-09 08:26:00                  NaN    NaN     NaN  NaN
2014-11-09 08:27:00                  NaN    NaN     NaN  NaN      
...                                  NaN    NaN     NaN  NaN      
2014-11-09 09:18:00                  NaN    NaN     NaN  NaN  
2014-11-09 09:19:00                  NaN    NaN     23   NaN      

也就是说:我想合并具有相同时间戳记的列(我有17列),以1分钟的粒度重新采样,对于那些没有值的列,我希望使用NaN.

That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.

我是通过以下方式开始的:

I started in the following ways:

df.groupby('Timestamp').sum()

df = df.resample('1Min', how='max')

但是我得到了以下错误:

but I obtained the following error:

ValueError: cannot reindex from a duplicate axis

我该如何解决这个问题?我只是在学习Python,所以我完全没有经验.

How can I solve this problem? I'm just learning Python so I don't have experience at all.

谢谢!

推荐答案

假定您以Timestamp作为索引开头,则需要先进行重新采样,然后再进行reset_index,然后再执行groupby,这是工作示例:

Assumed that you have your Timestamp as index to begin with, you need to do the resample first, and reset_index before doing a groupby, here's the working sample:

import pandas as pd

df
                       A   B   C  ...
Timestamp                            
2014-11-09 00:00:00  NaN   1 NaN  NaN
2014-11-09 00:00:00    2 NaN NaN  NaN
2014-11-09 00:00:00  NaN NaN   3  NaN
2014-11-09 08:24:00  NaN NaN   1  NaN
2014-11-09 08:24:00  105 NaN NaN  NaN
2014-11-09 09:19:00  NaN NaN  23  NaN

df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()

                      A   B   C  ...
Timestamp                           
2014-11-09 00:00:00   2   1   3  NaN
2014-11-09 00:01:00 NaN NaN NaN  NaN
2014-11-09 00:02:00 NaN NaN NaN  NaN
2014-11-09 00:03:00 NaN NaN NaN  NaN
2014-11-09 00:04:00 NaN NaN NaN  NaN
...
2014-11-09 09:17:00 NaN NaN NaN  NaN
2014-11-09 09:18:00 NaN NaN NaN  NaN
2014-11-09 09:19:00 NaN NaN  23  NaN

希望这会有所帮助.

如评论中所述,您的时间戳记"不是日期时间,可能不是字符串,因此您无法通过DatetimeIndex重新采样,只需reset_index并将其转换为这样的内容即可:

As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:

df = df.reset_index()
df['ts'] = pd.to_datetime(df['Timestamp'])
# 'ts' is now datetime of 'Timestamp', you just need to set it to index
df = df.set_index('ts')
...

现在只需再次运行前面的代码,但将"Timestamp"替换为"ts",就可以了.

Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.

这篇关于"ValueError:无法从重复的轴重新索引"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆