"ValueError:无法从重复的轴重新索引" [英] "ValueError: cannot reindex from a duplicate axis"
问题描述
我有以下df:
Timestamp A B C ...
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
我想做以下事情:
Timestamp A B C ...
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 08:23:00 NaN NaN NaN NaN
2014-11-09 08:24:00 105 NaN 1 NaN
2014-11-09 08:25:00 NaN NaN NaN NaN
2014-11-09 08:26:00 NaN NaN NaN NaN
2014-11-09 08:27:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
也就是说:我想合并具有相同时间戳记的列(我有17列),以1分钟的粒度重新采样,对于那些没有值的列,我希望使用NaN.
That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.
我是通过以下方式开始的:
I started in the following ways:
df.groupby('Timestamp').sum()
和
df = df.resample('1Min', how='max')
但是我得到了以下错误:
but I obtained the following error:
ValueError: cannot reindex from a duplicate axis
我该如何解决这个问题?我只是在学习Python,所以我完全没有经验.
How can I solve this problem? I'm just learning Python so I don't have experience at all.
谢谢!
推荐答案
假定您以Timestamp
作为索引开头,则需要先进行重新采样,然后再进行reset_index
,然后再执行groupby
,这是工作示例:
Assumed that you have your Timestamp
as index to begin with, you need to do the resample first, and reset_index
before doing a groupby
, here's the working sample:
import pandas as pd
df
A B C ...
Timestamp
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()
A B C ...
Timestamp
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
2014-11-09 00:03:00 NaN NaN NaN NaN
2014-11-09 00:04:00 NaN NaN NaN NaN
...
2014-11-09 09:17:00 NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
希望这会有所帮助.
如评论中所述,您的时间戳记"不是日期时间,可能不是字符串,因此您无法通过DatetimeIndex重新采样,只需reset_index并将其转换为这样的内容即可:
As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:
df = df.reset_index()
df['ts'] = pd.to_datetime(df['Timestamp'])
# 'ts' is now datetime of 'Timestamp', you just need to set it to index
df = df.set_index('ts')
...
现在只需再次运行前面的代码,但将"Timestamp"替换为"ts",就可以了.
Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.
这篇关于"ValueError:无法从重复的轴重新索引"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!