重新采样错误：不能使用方法或限制重新索引非唯一索引 [英] Resampling Error : cannot reindex a non-unique index with a method or limit

查看：1445 发布时间：2018/5/30 14:09:29 python python-2.7 pandas group-by resampling

本文介绍了重新采样错误：不能使用方法或限制重新索引非唯一索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用Pandas来构造和处理Data。

我在这里有一个DataFrame，它的日期是索引，Id和比特率。
我想通过ID对数据进行分组，并重新采样，同时，对每个ID相对的时间，最后保持比特率分数。

例如，给出：

  df = pd.DataFrame（
 {'Id'：['CODI126640013.ts' ，'CODI126622312.ts']，
'beginning_time'：['2016-07-08 02:17:42'，'2016-07-08 02:05:35']，
'end_time '：['2016-07-08 02:17:55'，'2016-07-08 02:26:11']，
'bitrate'：['3750000'，'3750000']，
'type'：['vod'，'catchup']，
'unique_id'：['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30'，'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']} ）

给出：

这是我的代码，可以在每次使用Id和比特率时获得日期的唯一列：

df = df.drop（['type'，'unique_id']，axis = 1） df .beginning_time = pd.to_datetime（df.beginning_time） df.end_time = pd.to_datetime（df.end_time） df = pd.melt（df，id_vars = ['Id'，'bitrate'] ，'value_name ='dates'）。drop（'variable'，axis = 1） df.set_index（'dates'，inplace = True）
给出：

现在，重新采样的时间！
这是我的代码：

print（df.groupby（'Id'）。resample（'1S'） .ffill（））
这就是结果：

这正是我想要做的！
我有38279个日志和相同的列，当我做同样的事情时，我有一个错误消息。第一部分完美地工作，并给出了这个：

（df.groupby（'Id'）。resample（'1S'）。ffill（））部分给出了这个错误信息：

ValueError：无法使用方法或限制重新索引非唯一索引
有什么想法？ Thnx！
解决方案
列似乎存在重复的问题 beginning_time 和 end_time ，我尝试模拟它：

df = pd.DataFrame （ {'Id'：['CODI126640013.ts'，'CODI126622312.ts'，'a']， 'beginning_time'：['2016-07-08 02:17:42'， '2016-07-08 02:17:42'，'2016-07-08 02:17:45']， 'end_time'：['2016-07-08 02:17:42'，'' 2016-07-08 02:17:42'，'2016-07-08 02:17:42']， 'bitrate'：['3750000'，'3750000'，'444']， 'type'：['vod'，'catchup'，'s']， 'unique_id'：['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30'，'f2514f6b-ce7e-4e1a-8f6a- 3ac5d524bb22'，'w']}） print（df） Id beginning_time bitrate end_time \ 0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42 1 CODI126622312.ts 2016-07-08 02:17:42 375000 0 2016-07-08 02:17:42 2 a 2016-07-08 02:17:45 444 2016-07-08 02:17:42 类型unique_id 0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30 1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22 2 sw

df = df.drop（['type'，'unique_id']，axis = 1） df.beginning_time = pd.to_datetime（df.beginning_time） df.end_time = pd.to_datetime（df.end_time） df = pd.melt（df，id_vars = ['Id' ，'bitrate']，value_name ='dates'）。drop（'variable'，axis = 1） df.set_index（'dates'，inplace = True）打印（df） Id比特率日期 2016-07-08 02:17:42 CODI126640013.ts 3750000 2016-07-08 02:17:42 CODI126622312。 ts 3750000 2016-07-08 02:17:45 a 444 201 6-07-08 02:17:42 CODI126640013.ts 3750000 2016-07-08 02:17:42 CODI126622312.ts 3750000 2016-07-08 02:17:42 a 444 print（df.groupby（'Id'）。resample（'1S'）.ffill（））

ValueError：无法通过方法或限制重新索引非唯一索引

一种可能的解决方案是添加 drop_duplicates ，并使用旧的方式对 resample code>与 groupby ：
df = df.drop （['type'，'unique_id']，axis = 1） df.beginning_time = pd.to_datetime（df.beginning_time） df.end_time = pd.to_datetime（df.end_time） df = pd.melt（df，id_vars = ['Id'，'bitrate']，value_name ='dates'）。drop（'variable'，axis = 1） pr int（df.groupby（'Id'）。apply（lambda x：x.drop_duplicates（'dates'） .set_index（'dates'） .resample（'1S'） .ffill（））） Id比特率 Id日期 CODI126622312.ts 2016-07-08 02:17:42 CODI126622312.ts 3750000 CODI126640013.ts 2016-07-08 02:17:42 CODI126640013.ts 3750000 a 2016-07-08 02:17:41 a 444 2016-07-08 02:17:42 a 444 2016-07-08 02:17:43 a 444 2016-07-08 02:17:44 a 444 2016-07-08 02:17:45 a 444
您也可以通过布尔索引：
print（df [df.beginning_time == df.end_time]） 2 sw Id begin_time bitrate end_time \ 0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42 1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02： 17:42 type unique_id 0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30 1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22
I am using Pandas to structure and process Data.
I have here a DataFrame with dates as index, Id and bitrate. I want to group my Data by Id and resample, at the same time, timedates which are relative to every Id, and finally keep the bitrate score. For example, given : df = pd.DataFrame( {'Id' : ['CODI126640013.ts', 'CODI126622312.ts'], 'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'], 'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'], 'bitrate': ['3750000', '3750000'], 'type' : ['vod', 'catchup'], 'unique_id' : ['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']}) which gives : This is my code to get a unique column for dates with every time the Id and the bitrate : df = df.drop(['type', 'unique_id'], axis=1) df.beginning_time = pd.to_datetime(df.beginning_time) df.end_time = pd.to_datetime(df.end_time) df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1) df.set_index('dates', inplace=True) which gives : And now, time for Resample ! This is my code : print (df.groupby('Id').resample('1S').ffill()) And this is the result : This is exactly what I want to do ! I have 38279 logs with the same columns and I have an error message when I do the same thing. The first part works perfectly, and gives this : The part (df.groupby('Id').resample('1S').ffill()) gives this error message : ValueError: cannot reindex a non-unique index with a method or limit Any ideas ? Thnx ! 解决方案 It seems there is problem with duplicates in columns beginning_time and end_time, I try simulate it: df = pd.DataFrame( {'Id' : ['CODI126640013.ts', 'CODI126622312.ts', 'a'], 'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:45'], 'end_time' :['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:42'], 'bitrate': ['3750000', '3750000', '444'], 'type' : ['vod', 'catchup', 's'], 'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22','w']}) print (df) Id beginning_time bitrate end_time \ 0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42 1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42 2 a 2016-07-08 02:17:45 444 2016-07-08 02:17:42 type unique_id 0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30 1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22 2 s w df = df.drop(['type', 'unique_id'], axis=1) df.beginning_time = pd.to_datetime(df.beginning_time) df.end_time = pd.to_datetime(df.end_time) df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1) df.set_index('dates', inplace=True) print (df) Id bitrate dates 2016-07-08 02:17:42 CODI126640013.ts 3750000 2016-07-08 02:17:42 CODI126622312.ts 3750000 2016-07-08 02:17:45 a 444 2016-07-08 02:17:42 CODI126640013.ts 3750000 2016-07-08 02:17:42 CODI126622312.ts 3750000 2016-07-08 02:17:42 a 444 print (df.groupby('Id').resample('1S').ffill()) ValueError: cannot reindex a non-unique index with a method or limit One possible solution is add drop_duplicates and use old way for resample with groupby: df = df.drop(['type', 'unique_id'], axis=1) df.beginning_time = pd.to_datetime(df.beginning_time) df.end_time = pd.to_datetime(df.end_time) df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1) print (df.groupby('Id').apply(lambda x : x.drop_duplicates('dates') .set_index('dates') .resample('1S') .ffill())) Id bitrate Id dates CODI126622312.ts 2016-07-08 02:17:42 CODI126622312.ts 3750000 CODI126640013.ts 2016-07-08 02:17:42 CODI126640013.ts 3750000 a 2016-07-08 02:17:41 a 444 2016-07-08 02:17:42 a 444 2016-07-08 02:17:43 a 444 2016-07-08 02:17:44 a 444 2016-07-08 02:17:45 a 444 You can also check duplicates by boolean indexing: print (df[df.beginning_time == df.end_time]) 2 s w Id beginning_time bitrate end_time \ 0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42 1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42 type unique_id 0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30 1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22 这篇关于重新采样错误：不能使用方法或限制重新索引非唯一索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
查看全文

重新采样错误：不能使用方法或限制重新索引非唯一索引 [英] Resampling Error : cannot reindex a non-unique index with a method or limit

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

重新采样错误：不能使用方法或限制重新索引非唯一索引 [英] Resampling Error : cannot reindex a non-unique index with a method or limit

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭