高效的timedelta计算器 [英] Efficient timedelta calculator

查看:212
本文介绍了高效的timedelta计算器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自数据记录器的时间序列数据,该数据记录了时间戳记(以日期MM--DD-YY HH:MM:SS:xxx:yyy的形式显示(例如-[29.08.2018 16:26: 31.406]-),其中xxx和yyy分别是毫秒和微秒),在记录数据时精确到微秒。现在您可以想象几分钟内记录的生成文件可能很大。 (100兆字节)。我需要从这个文件中绘制出一堆数据与时间的关系,以毫秒为单位(理想情况下)。
数据如下所示:

I have a time series data from a data logger that puts time stamps (in the form of dates MM--DD-YY HH:MM:SS:xxx:yyy (e.g. --[ 29.08.2018 16:26:31.406 ] --) where xxx and yyy are milliseconds and microseconds respectively) precise up to microseconds when recording data. Now you can imagine that the generated file recorded over a few minutes could be very big. (100s of megabytes). I need to plot a bunch of data from this file vs time in millisconds (ideally). The data looks like below:

因此,我需要在python中解析这些日期并计算timedelta以找到样本之间的间隔时间,然后生成图。当我减去这两个时间戳(-[29.08.2018 16:23:41.052]-和-[29.08.2018 16:23:41.114]-)时,我想获得62毫秒作为时间间隔这两个时间戳。

So I need to parse these dates in python and calculate timedelta to find timelapsed between samples and then generate plots. As when I subtract these two time stamps (--[ 29.08.2018 16:23:41.052 ] -- and --[ 29.08.2018 16:23:41.114 ] --), I want to get 62 milliseconds as time lapsed between these two time stamps.

当前,我正在使用 dateparser(通过将dateparser导入为dp),该属性在解析后输出datetime,然后我可以减去它们以提取timedelta,然后根据需要转换为毫秒或秒。
但是此功能花费的时间太长,并且是我后期处理脚本中的瓶颈。

Currently I am using 'dateparser' (by import dateparser as dp) which outputs datetime after parsing and then I can subtract those to extract timedelta and then convert into ms or seconds as I need. But this function is taking too long and is the bottleneck in my post processing script.

任何人都可以建议一个更好的库来更有效地解析日期和计算timedelta?

Anyone could suggest a better library that is more efficient in parsing dates and calculating timedelta?

效率不高的代码

import dateparser as dp
def timedelta_local(date1, date2):
import dateparser as dp
timedelta = dp.parse(date2)-dp.parse(date1)
timediff={'us': timedelta.microseconds+timedelta.seconds*1000000+timedelta.days*24*60*60*1000000,
          'ms':timedelta.microseconds/1000+timedelta.seconds*1000+timedelta.days*24*60*60*1000,
          'sec': timedelta.microseconds/1000000+timedelta.seconds+timedelta.days*24*60*60,
          'minutes': timedelta.microseconds/1000000/60+timedelta.seconds/60+timedelta.days*24*60
         }
return timediffe

提前感谢

推荐答案

@zvone在这里正确。熊猫是您最好的朋友。这是一些示例代码,有望使您步入正轨。假设您的数据位于CSV文件中,并且标题行与示例中显示的行相同。我不确定您是想将时差保留为timedelta对象(易于进行进一步的数学运算)还是将其简化为浮点数。

@zvone is correct here. pandas is your best friend for this. Here is some sample code that will hopefully get you on the right track. It assumes your data is in a CSV file with a header line like the one you show in your example. I wasn't sure whether you wanted to keep the time difference as a timedelta object (easy for doing further math with) or just simplify it to a float. I did both.

import pandas as pd

df = pd.read_csv("test.csv", parse_dates=[0])

# What are the data types after the initial import?

print(f'{df.dtypes}\n\n')

# What are the contents of the data frame?

print(f'{df}\n\n')

# Create a new column that strips away leading and trailing characters 
# that surround the data we want

df['Clean Time Stamp'] = df['Time Stamp'].apply(lambda x: x[3:-4])

# Convert to a pandas Timestamp. Use infer_datetime_format for speed.

df['Real Time Stamp'] = pd.to_datetime(df['Clean Time Stamp'], infer_datetime_format=True)

# Calculate time difference between successive rows

df['Delta T'] = df['Real Time Stamp'].diff()

# Convert pandas timedelta to a floating point value in milliseconds.

df['Delta T ms'] = df['Delta T'].dt.total_seconds() * 1000

print(f'{df.dtypes}\n\n')
print(df)

输出看起来像这样。请注意,数据框的打印将列环绕到另一行-这仅仅是打印的产物。

The output looks like this. Note that the printing of the dataframe is wrapping the columns around to another line - this is just an artifact of printing it.

Time Stamp     object
 Limit A        int64
 Value A      float64
 Limit B        int64
 Value B      float64
dtype: object


                         Time Stamp   Limit A   Value A   Limit B   Value B
0  --[ 29.08.2018 16:23:41.052 ] --        15     3.109        30     2.907
1  --[ 29.08.2018 16:23:41.114 ] --        15     3.020        30     8.242


Time Stamp                   object
 Limit A                      int64
 Value A                    float64
 Limit B                      int64
 Value B                    float64
Clean Time Stamp             object
Real Time Stamp      datetime64[ns]
Delta T             timedelta64[ns]
Delta T ms                  float64
dtype: object


                         Time Stamp   Limit A   Value A   Limit B   Value B  \
0  --[ 29.08.2018 16:23:41.052 ] --        15     3.109        30     2.907   
1  --[ 29.08.2018 16:23:41.114 ] --        15     3.020        30     8.242   

            Clean Time Stamp         Real Time Stamp         Delta T  \
0   29.08.2018 16:23:41.052  2018-08-29 16:23:41.052             NaT   
1   29.08.2018 16:23:41.114  2018-08-29 16:23:41.114 00:00:00.062000   

   Delta T ms  
0         NaN  
1        62.0  

如果文件较大,则可以通过在适当位置编辑列而不是创建新列来提高效率像我一样

If your files are large you may gain some efficiency by editing columns in place rather than creating new ones like I did.

这篇关于高效的timedelta计算器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆