Python pandas 将重复时间戳更改为唯一 [英] Python pandas change duplicate timestamp to unique
问题描述
例如,我在文件中有这些,
....
2011/1 / 4 9:14:00
2011/1/4 9:15:00
2011/1/4 9:15:01
2011/1/4 9:15:01
2011/1/4 9:15:02
2011/1/4 9:15:02
2011/1/4 9:15:03
2011/1/4 9:15:03
2011/1/4 9:15:04
....
我想将它们更改为
2011/1/4 9:14:00
2011/1/4 9:15:00
2011/1/4 9:15:01
2011/1/4 9:15:01.500
2011/1/4 9: 15:02
2011/1/4 9:15:02.500
2011/1/4 9:15:03
2011/1/4 9:15:03.500
2011 / 1/4 9:15:04
....
最多的是
设置
在[69]中:df = DataFrame(dict(time = x))
在[70]中:df
输出[70]:
时间
0 2013-01-01 09:01:00
1 2013-01-01 09:01:00
2 2013-01-01 09:01:01
3 2013-01-01 09:01:01
4 2013-01-01 09:01:02
5 2013-01-01 09:01:02
6 2013-01-01 09:01:03
7 2013-01-01 09:01:03
8 2013-01-01 09:01:04
9 2013-01-01 09:01:04
查找与之前的时间差异的位置行为0秒
在[71]中:mask =(df.time-df.time.shift())== np.timedelta64(0,'s')
在[72]中:mask
Out [72]:
0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 True
8 False
9 True
名称:时间,dtype:bool
设置主题位置使用5毫秒的偏移量(在你的问题你使用500但可以是任何东西)。这需要numpy> = 1.7。 (不是这个语法将在0.13中更改以允许更直接的 df.loc [mask,'time'] + = pd.offsets.Milli(5)
在[73]中:df.loc [mask,'time'] = df.time [mask] .apply(lambda x: x + pd.offsets.Milli(5))
在[74]中:df
输出[74]:
时间
0 2013-01-01 09 :01:00
1 2013-01-01 09:01:00.005000
2 2013-01-01 09:01:01
3 2013-01-01 09:01:01.005000
4 2013-01-01 09:01:02
5 2013-01-01 09:01:02.005000
6 2013-01-01 09:01:03
7 2013 -01-01 09:01:03.005000
8 2013-01-01 09:01:04
9 2013-01-01 09:01:04.005000
I have a file containing duplicate timestamps, maximum two for each timestamp, actually they are not duplicate, it is just the second timestamp needs to add a millisecond timestamp. For example, I am having these in the file,
....
2011/1/4 9:14:00
2011/1/4 9:15:00
2011/1/4 9:15:01
2011/1/4 9:15:01
2011/1/4 9:15:02
2011/1/4 9:15:02
2011/1/4 9:15:03
2011/1/4 9:15:03
2011/1/4 9:15:04
....
I would like to change them into
2011/1/4 9:14:00
2011/1/4 9:15:00
2011/1/4 9:15:01
2011/1/4 9:15:01.500
2011/1/4 9:15:02
2011/1/4 9:15:02.500
2011/1/4 9:15:03
2011/1/4 9:15:03.500
2011/1/4 9:15:04
....
what is the most efficient way to perform such task?
Setup
In [69]: df = DataFrame(dict(time = x))
In [70]: df
Out[70]:
time
0 2013-01-01 09:01:00
1 2013-01-01 09:01:00
2 2013-01-01 09:01:01
3 2013-01-01 09:01:01
4 2013-01-01 09:01:02
5 2013-01-01 09:01:02
6 2013-01-01 09:01:03
7 2013-01-01 09:01:03
8 2013-01-01 09:01:04
9 2013-01-01 09:01:04
Find the locations where the difference in time from the previous row is 0 seconds
In [71]: mask = (df.time-df.time.shift()) == np.timedelta64(0,'s')
In [72]: mask
Out[72]:
0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 True
8 False
9 True
Name: time, dtype: bool
Set theose locations to use an offset of 5 milliseconds (In your question you used 500 but could be anything). This requires numpy >= 1.7. (Not that this syntax will be changing in 0.13 to allow a more direct df.loc[mask,'time'] += pd.offsets.Milli(5)
In [73]: df.loc[mask,'time'] = df.time[mask].apply(lambda x: x+pd.offsets.Milli(5))
In [74]: df
Out[74]:
time
0 2013-01-01 09:01:00
1 2013-01-01 09:01:00.005000
2 2013-01-01 09:01:01
3 2013-01-01 09:01:01.005000
4 2013-01-01 09:01:02
5 2013-01-01 09:01:02.005000
6 2013-01-01 09:01:03
7 2013-01-01 09:01:03.005000
8 2013-01-01 09:01:04
9 2013-01-01 09:01:04.005000
这篇关于Python pandas 将重复时间戳更改为唯一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!