重新采样错误:不能使用方法或限制重新索引非唯一索引 [英] Resampling Error : cannot reindex a non-unique index with a method or limit

查看:1445
本文介绍了重新采样错误:不能使用方法或限制重新索引非唯一索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Pandas来构造和处理Data。



我在这里有一个DataFrame,它的日期是索引,Id和比特率。
我想通过ID对数据进行分组,并重新采样,同时,对每个ID相对的时间,最后保持比特率分数。

例如,给出:

  df = pd.DataFrame(
{'Id':['CODI126640013.ts' ,'CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42','2016-07-08 02:05:35'],
'end_time ':['2016-07-08 02:17:55','2016-07-08 02:26:11'],
'bitrate':['3750000','3750000'],
'type':['vod','catchup'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30','f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']} )

给出:



这是我的代码,可以在每次使用Id和比特率时获得日期的唯一列:

  df = df.drop(['type','unique_id'],axis = 1)
df .beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id','bitrate'] ,'value_name ='dates')。drop('variable',axis = 1)
df.set_index('dates',inplace = True)

给出:



现在,重新采样的时间!
这是我的代码:

  print(df.groupby('Id')。resample('1S') .ffill())

这就是结果:



这正是我想要做的!
我有38279个日志和相同的列,当我做同样的事情时,我有一个错误消息。第一部分完美地工作,并给出了这个:





(df.groupby('Id')。resample('1S')。ffill())部分给出了这个错误信息:

  ValueError:无法使用方法或限制重新索引非唯一索引

有什么想法? Thnx!

解决方案

列似乎存在重复的问题 beginning_time end_time ,我尝试模拟它:

  df = pd.DataFrame (
{'Id':['CODI126640013.ts','CODI126622312.ts','a'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42','2016-07-08 02:17:45'],
'end_time':['2016-07-08 02:17:42','' 2016-07-08 02:17:42','2016-07-08 02:17:42'],
'bitrate':['3750000','3750000','444'],
'type':['vod','catchup','s'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30','f2514f6b-ce7e-4e1a-8f6a- 3ac5d524bb22','w']})

print(df)
Id beginning_time bitrate end_time \
0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
1 CODI126622312.ts 2016-07-08 02:17:42 375000 0 2016-07-08 02:17:42
2 a 2016-07-08 02:17:45 444 2016-07-08 02:17:42

类型unique_id
0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30
1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22
2 sw





  df = df.drop(['type','unique_id'],axis = 1) 
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id' ,'bitrate'],value_name ='dates')。drop('variable',axis = 1)
df.set_index('dates',inplace = True)


打印(df)
Id比特率
日期
2016-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:17:42 CODI126622312。 ts 3750000
2016-07-08 02:17:45 a 444
201 6-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:17:42 CODI126622312.ts 3750000
2016-07-08 02:17:42 a 444

print(df.groupby('Id')。resample('1S').ffill())




ValueError:无法通过方法或限制重新索引非唯一索引

一种可能的解决方案是添加 drop_duplicates ,并使用旧的方式 resample code>与 groupby

  df = df.drop (['type','unique_id'],axis = 1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id','bitrate'],value_name ='dates')。drop('variable',axis = 1)

pr int(df.groupby('Id')。apply(lambda x:x.drop_duplicates('dates')
.set_index('dates')
.resample('1S')
.ffill()))

Id比特率
Id日期
CODI126622312.ts 2016-07-08 02:17:42 CODI126622312.ts 3750000
CODI126640013.ts 2016-07-08 02:17:42 CODI126640013.ts 3750000
a 2016-07-08 02:17:41 a 444
2016-07-08 02:17:42 a 444
2016-07-08 02:17:43 a 444
2016-07-08 02:17:44 a 444
2016-07-08 02:17:45 a 444

您也可以通过 布尔索引

  print(df [df.beginning_time == df.end_time])
2 sw
Id begin_time bitrate end_time \
0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02: 17:42

type unique_id
0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30
1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22


I am using Pandas to structure and process Data.

I have here a DataFrame with dates as index, Id and bitrate. I want to group my Data by Id and resample, at the same time, timedates which are relative to every Id, and finally keep the bitrate score.

For example, given :

df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'], 
'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'],
'bitrate': ['3750000', '3750000'],
'type' : ['vod', 'catchup'],
'unique_id' : ['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']})

which gives :

This is my code to get a unique column for dates with every time the Id and the bitrate :

df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)

which gives :

And now, time for Resample ! This is my code :

print (df.groupby('Id').resample('1S').ffill())

And this is the result :

This is exactly what I want to do ! I have 38279 logs with the same columns and I have an error message when I do the same thing. The first part works perfectly, and gives this :

The part (df.groupby('Id').resample('1S').ffill()) gives this error message :

ValueError: cannot reindex a non-unique index with a method or limit

Any ideas ? Thnx !

解决方案

It seems there is problem with duplicates in columns beginning_time and end_time, I try simulate it:

df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts', 'a'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:45'], 
'end_time' :['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:42'],
'bitrate': ['3750000', '3750000', '444'],
'type' : ['vod', 'catchup', 's'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22','w']})

print (df)  
                 Id       beginning_time  bitrate             end_time  \
0  CODI126640013.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
1  CODI126622312.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
2                 a  2016-07-08 02:17:45      444  2016-07-08 02:17:42   

      type                             unique_id  
0      vod  f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30  
1  catchup  f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22  
2        s                                     w  

df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)


print (df)  
                                   Id  bitrate
dates                                         
2016-07-08 02:17:42  CODI126640013.ts  3750000
2016-07-08 02:17:42  CODI126622312.ts  3750000
2016-07-08 02:17:45                 a      444
2016-07-08 02:17:42  CODI126640013.ts  3750000
2016-07-08 02:17:42  CODI126622312.ts  3750000
2016-07-08 02:17:42                 a      444

print (df.groupby('Id').resample('1S').ffill())

ValueError: cannot reindex a non-unique index with a method or limit

One possible solution is add drop_duplicates and use old way for resample with groupby:

df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)

print (df.groupby('Id').apply(lambda x : x.drop_duplicates('dates')
                                          .set_index('dates')
                                          .resample('1S')
                                          .ffill()))

                                                    Id  bitrate
Id               dates                                         
CODI126622312.ts 2016-07-08 02:17:42  CODI126622312.ts  3750000
CODI126640013.ts 2016-07-08 02:17:42  CODI126640013.ts  3750000
a                2016-07-08 02:17:41                 a      444
                 2016-07-08 02:17:42                 a      444
                 2016-07-08 02:17:43                 a      444
                 2016-07-08 02:17:44                 a      444
                 2016-07-08 02:17:45                 a      444

You can also check duplicates by boolean indexing:

print (df[df.beginning_time == df.end_time])
2        s                                     w  
                 Id       beginning_time  bitrate             end_time  \
0  CODI126640013.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
1  CODI126622312.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   

      type                             unique_id  
0      vod  f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30  
1  catchup  f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22  

这篇关于重新采样错误:不能使用方法或限制重新索引非唯一索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆