重新采样错误:不能使用方法或限制重新索引非唯一索引 [英] Resampling Error : cannot reindex a non-unique index with a method or limit
问题描述
我使用Pandas来构造和处理Data。
我在这里有一个DataFrame,它的日期是索引,Id和比特率。
我想通过ID对数据进行分组,并重新采样,同时,对每个ID相对的时间,最后保持比特率分数。
例如,给出:
df = pd.DataFrame(
{'Id':['CODI126640013.ts' ,'CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42','2016-07-08 02:05:35'],
'end_time ':['2016-07-08 02:17:55','2016-07-08 02:26:11'],
'bitrate':['3750000','3750000'],
'type':['vod','catchup'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30','f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']} )
给出:
这是我的代码,可以在每次使用Id和比特率时获得日期的唯一列:
df = df.drop(['type','unique_id'],axis = 1)
df .beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id','bitrate'] ,'value_name ='dates')。drop('variable',axis = 1)
df.set_index('dates',inplace = True)
给出:
现在,重新采样的时间!
这是我的代码:
print(df.groupby('Id')。resample('1S') .ffill())
这就是结果:
这正是我想要做的!
我有38279个日志和相同的列,当我做同样的事情时,我有一个错误消息。第一部分完美地工作,并给出了这个:
(df.groupby('Id')。resample('1S')。ffill())部分给出了这个错误信息:
ValueError:无法使用方法或限制重新索引非唯一索引
有什么想法? Thnx!
列似乎存在重复的问题 beginning_time
和 end_time
,我尝试模拟它:
df = pd.DataFrame (
{'Id':['CODI126640013.ts','CODI126622312.ts','a'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42','2016-07-08 02:17:45'],
'end_time':['2016-07-08 02:17:42','' 2016-07-08 02:17:42','2016-07-08 02:17:42'],
'bitrate':['3750000','3750000','444'],
'type':['vod','catchup','s'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30','f2514f6b-ce7e-4e1a-8f6a- 3ac5d524bb22','w']})
print(df)
Id beginning_time bitrate end_time \
0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
1 CODI126622312.ts 2016-07-08 02:17:42 375000 0 2016-07-08 02:17:42
2 a 2016-07-08 02:17:45 444 2016-07-08 02:17:42
类型unique_id
0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30
1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22
2 sw
df = df.drop(['type','unique_id'],axis = 1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id' ,'bitrate'],value_name ='dates')。drop('variable',axis = 1)
df.set_index('dates',inplace = True)
打印(df)
Id比特率
日期
2016-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:17:42 CODI126622312。 ts 3750000
2016-07-08 02:17:45 a 444
201 6-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:17:42 CODI126622312.ts 3750000
2016-07-08 02:17:42 a 444
print(df.groupby('Id')。resample('1S').ffill())
ValueError:无法通过方法或限制重新索引非唯一索引
一种可能的解决方案是添加 drop_duplicates
,并使用旧的方式对 resample code>与
groupby
:
df = df.drop (['type','unique_id'],axis = 1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id','bitrate'],value_name ='dates')。drop('variable',axis = 1)
pr int(df.groupby('Id')。apply(lambda x:x.drop_duplicates('dates')
.set_index('dates')
.resample('1S')
.ffill()))
Id比特率
Id日期
CODI126622312.ts 2016-07-08 02:17:42 CODI126622312.ts 3750000
CODI126640013.ts 2016-07-08 02:17:42 CODI126640013.ts 3750000
a 2016-07-08 02:17:41 a 444
2016-07-08 02:17:42 a 444
2016-07-08 02:17:43 a 444
2016-07-08 02:17:44 a 444
2016-07-08 02:17:45 a 444
您也可以通过 布尔索引
:
print(df [df.beginning_time == df.end_time])
2 sw
Id begin_time bitrate end_time \
0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02: 17:42
type unique_id
0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30
1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22
I am using Pandas to structure and process Data.
I have here a DataFrame with dates as index, Id and bitrate. I want to group my Data by Id and resample, at the same time, timedates which are relative to every Id, and finally keep the bitrate score.
For example, given :
df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'],
'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'],
'bitrate': ['3750000', '3750000'],
'type' : ['vod', 'catchup'],
'unique_id' : ['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']})
which gives :
This is my code to get a unique column for dates with every time the Id and the bitrate :
df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)
which gives :
And now, time for Resample ! This is my code :
print (df.groupby('Id').resample('1S').ffill())
And this is the result :
This is exactly what I want to do ! I have 38279 logs with the same columns and I have an error message when I do the same thing. The first part works perfectly, and gives this :
The part (df.groupby('Id').resample('1S').ffill()) gives this error message :
ValueError: cannot reindex a non-unique index with a method or limit
Any ideas ? Thnx !
It seems there is problem with duplicates in columns beginning_time
and end_time
, I try simulate it:
df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts', 'a'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:45'],
'end_time' :['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:42'],
'bitrate': ['3750000', '3750000', '444'],
'type' : ['vod', 'catchup', 's'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22','w']})
print (df)
Id beginning_time bitrate end_time \
0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
2 a 2016-07-08 02:17:45 444 2016-07-08 02:17:42
type unique_id
0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30
1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22
2 s w
df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)
print (df)
Id bitrate
dates
2016-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:17:42 CODI126622312.ts 3750000
2016-07-08 02:17:45 a 444
2016-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:17:42 CODI126622312.ts 3750000
2016-07-08 02:17:42 a 444
print (df.groupby('Id').resample('1S').ffill())
ValueError: cannot reindex a non-unique index with a method or limit
One possible solution is add drop_duplicates
and use old way for resample
with groupby
:
df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
print (df.groupby('Id').apply(lambda x : x.drop_duplicates('dates')
.set_index('dates')
.resample('1S')
.ffill()))
Id bitrate
Id dates
CODI126622312.ts 2016-07-08 02:17:42 CODI126622312.ts 3750000
CODI126640013.ts 2016-07-08 02:17:42 CODI126640013.ts 3750000
a 2016-07-08 02:17:41 a 444
2016-07-08 02:17:42 a 444
2016-07-08 02:17:43 a 444
2016-07-08 02:17:44 a 444
2016-07-08 02:17:45 a 444
You can also check duplicates by boolean indexing
:
print (df[df.beginning_time == df.end_time])
2 s w
Id beginning_time bitrate end_time \
0 CODI126640013.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
1 CODI126622312.ts 2016-07-08 02:17:42 3750000 2016-07-08 02:17:42
type unique_id
0 vod f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30
1 catchup f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22
这篇关于重新采样错误:不能使用方法或限制重新索引非唯一索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!