Python pandas 将重复时间戳更改为唯一 [英] Python pandas change duplicate timestamp to unique

查看：165 发布时间：2017/7/21 19:13:33 python duplicates pandas

本文介绍了Python pandas 将重复时间戳更改为唯一的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含重复时间戳的文件，每个时间戳最多为两个，实际上它们不重复，只是第二个时间戳需要添加一个毫秒的时间戳。
例如，我在文件中有这些，

  .... 
 2011/1 / 4 9:14:00 
 2011/1/4 9:15:00 
 2011/1/4 9:15:01 
 2011/1/4 9:15:01 
 2011/1/4 9:15:02 
 2011/1/4 9:15:02 
 2011/1/4 9:15:03 
 2011/1/4 9:15:03 
 2011/1/4 9:15:04 
 ....

我想将它们更改为

  2011/1/4 9:14:00 
 2011/1/4 9:15:00 
 2011/1/4 9:15:01 
 2011/1/4 9：15：01.500 
 2011/1/4 9： 15:02 
 2011/1/4 9：15：02.500 
 2011/1/4 9:15:03 
 2011/1/4 9：15：03.500 
 2011 / 1/4 9:15:04 
 ....

最多的是

解决方案

设置

 在[69]中：df = DataFrame（dict（time = x））
 
在[70]中：df 
输出[70]：
时间
 0 2013-01-01 09:01:00 
 1 2013-01-01 09:01:00 
 2 2013-01-01 09:01:01 
 3 2013-01-01 09:01:01 
 4 2013-01-01 09:01:02 
 5 2013-01-01 09:01:02 
 6 2013-01-01 09:01:03 
 7 2013-01-01 09:01:03 
 8 2013-01-01 09:01:04 
 9 2013-01-01 09:01:04

查找与之前的时间差异的位置行为0秒

 在[71]中：mask =（df.time-df.time.shift（））== np.timedelta64（0，'s'）
 
在[72]中：mask 
 Out [72]：
 0 False 
 1 True 
 2 False 
 3 True 
 4 False 
 5 True 
 6 False 
 7 True 
 8 False 
 9 True 
名称：时间，dtype：bool

设置主题位置使用5毫秒的偏移量（在你的问题你使用500但可以是任何东西）。这需要numpy> = 1.7。（不是这个语法将在0.13中更改以允许更直接的 df.loc [mask，'time'] + = pd.offsets.Milli（5）

 在[73]中：df.loc [mask，'time'] = df.time [mask] .apply（lambda x： x + pd.offsets.Milli（5））
 
在[74]中：df 
输出[74]：
时间
 0 2013-01-01 09 ：01：00 
 1 2013-01-01 09：01：00.005000 
 2 2013-01-01 09:01:01 
 3 2013-01-01 09：01：01.005000 
 4 2013-01-01 09:01:02 
 5 2013-01-01 09：01：02.005000 
 6 2013-01-01 09:01:03 
 7 2013 -01-01 09：01：03.005000 
 8 2013-01-01 09:01:04 
 9 2013-01-01 09：01：04.005000

I have a file containing duplicate timestamps, maximum two for each timestamp, actually they are not duplicate, it is just the second timestamp needs to add a millisecond timestamp. For example, I am having these in the file,

....
2011/1/4    9:14:00
2011/1/4    9:15:00
2011/1/4    9:15:01
2011/1/4    9:15:01
2011/1/4    9:15:02
2011/1/4    9:15:02
2011/1/4    9:15:03
2011/1/4    9:15:03
2011/1/4    9:15:04
....

I would like to change them into

2011/1/4    9:14:00
2011/1/4    9:15:00
2011/1/4    9:15:01
2011/1/4    9:15:01.500
2011/1/4    9:15:02
2011/1/4    9:15:02.500
2011/1/4    9:15:03
2011/1/4    9:15:03.500
2011/1/4    9:15:04
....

what is the most efficient way to perform such task?

解决方案

Setup

In [69]: df = DataFrame(dict(time = x))

In [70]: df
Out[70]: 
                 time
0 2013-01-01 09:01:00
1 2013-01-01 09:01:00
2 2013-01-01 09:01:01
3 2013-01-01 09:01:01
4 2013-01-01 09:01:02
5 2013-01-01 09:01:02
6 2013-01-01 09:01:03
7 2013-01-01 09:01:03
8 2013-01-01 09:01:04
9 2013-01-01 09:01:04

Find the locations where the difference in time from the previous row is 0 seconds

In [71]: mask = (df.time-df.time.shift()) == np.timedelta64(0,'s')

In [72]: mask
Out[72]: 
0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
9     True
Name: time, dtype: bool

Set theose locations to use an offset of 5 milliseconds (In your question you used 500 but could be anything). This requires numpy >= 1.7. (Not that this syntax will be changing in 0.13 to allow a more direct df.loc[mask,'time'] += pd.offsets.Milli(5)

In [73]: df.loc[mask,'time'] = df.time[mask].apply(lambda x: x+pd.offsets.Milli(5))

In [74]: df
Out[74]: 
                        time
0        2013-01-01 09:01:00
1 2013-01-01 09:01:00.005000
2        2013-01-01 09:01:01
3 2013-01-01 09:01:01.005000
4        2013-01-01 09:01:02
5 2013-01-01 09:01:02.005000
6        2013-01-01 09:01:03
7 2013-01-01 09:01:03.005000
8        2013-01-01 09:01:04
9 2013-01-01 09:01:04.005000

这篇关于Python pandas 将重复时间戳更改为唯一的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python pandas 将重复时间戳更改为唯一 [英] Python pandas change duplicate timestamp to unique

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python pandas 将重复时间戳更改为唯一 [英] Python pandas change duplicate timestamp to unique

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭