如何包括动态时间？ [英] How to include dynamic time?

查看：219 发布时间：2017/2/24 21:31:03 python csv datetime numpy pandas

本文介绍了如何包括动态时间？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图根据时间段提取日志。下面的程序运行非常好，没有。的时间，并且提取该范围中的日志。

但现在我还要包括开始和结束动态给定。即上午8点到下午8点或上午6点到8点等。

如何获得？在当前程序中的任何编辑也将做或单独的程序也将做。

输入：Mini版本 INPUT

代码：

  import pandas as pd 
从datetime import datetime，time 
 import numpy as np 
 
 fn = r'00_Dart.csv'
 cols = ['UserID'，'StartTime '，'StopTime'，'gps1'，'gps2'] 
 df = pd.read_csv（fn，header = None，names = cols）
 
 df ['m'] = df .StopTime + df.StartTime 
 df ['d'] = df.StopTime  -  df.StartTime 
 
用于报告DF的'start'和'end'：`r`
＃它将包含相等的间隔（在这种情况下为1小时）
 start = pd.to_datetime（df.StartTime.min（），unit ='s'）。date（）
 end = pd .to_datetime（df.StopTime.max（），unit ='s'）。date（）+ pd.Timedelta（days = 1）
 
＃建筑报告DF：`r` 
 freq ='1H'＃1小时频率
 idx = pd.date_range（start，end，freq = freq）
r = pd.DataFrame（index = idx）
r ['start'] = （r.index  -  pd.datetime（1970,1,1））。total_seconds（）。astype（np.int64）
 
＃1小时，以秒为单位，减去1秒计数两次）
 interval = 60 * 60  -  1 
 
r ['LogCount'] = 0 
r ['UniqueIDCount'] = 0 
 
 i，r.iterrows（）中的行：
＃区间重叠测试
＃https://en.wikipedia.org/wiki/Interval_tree#Overlap_test 
＃我稍微简化了计算的m和d 
＃通过除以2，
＃，因为它可以完成消除常用术语
u = df [np.abs（df.m  -  2 * row.start - 间隔） df.d + interval] .UserID 
 r.ix [i，['LogCount'，'UniqueIDCount']] = [len（u），u.nunique（）] 
 
r [ 'Date'] = pd.to_datetime（r.start，unit ='s'）。dt.date 
r ['Day'] = pd.to_datetime（r.start，unit ='s'）。 weekday_name.str [：3] 
r ['StartTime'] = pd.to_datetime（r.start，unit ='s'）。dt.time 
r ['EndTime'] = pd.to_datetime .start + interval + 1，unit ='s'）。dt.time 
 
＃r.to_csv（'results.csv'，index = False）
 #print .LogCount> 0]）
 #print（r ['StartTime']，r ['EndTime']，r ['Day']，r ['LogCount']，r ['UniqueIDCount']）
 
 rout = r [['Date'，'StartTime'，'EndTime'，'Day'，'LogCount'，'UniqueIDCount']] 
 #print rout 
 rout。 to_csv（'one_hour.csv'，index = False，header = False）

简单来说，我应该能够给予 StartTime 和 EndTIme 。下面的代码非常接近我想要做的。但如何将这转换为熊猫。

 来自datetime import datetime，time 
 
 start = time（8,0,0）
 end = time（20,0,0）
 
 open（'USC28days_0_20'，'r'）as infile，open（'USC28days_0_20_time'，'w'）as outfile：
 for row in infile：
 col = row.split（）
 t1 = datetime.fromtimestamp（float（col [2]））。time（）
 t2 = datetime.fromtimestamp （col [3]））。time（）
 print（t1> = start and t2 <= end）

$ b b

编辑二：在Pandas工作答案

从MaxU的答案中选择答案。下面的代码删除了给定的 StartTime 和 StopTime

$之间的所需日志组b $ b

  import pandas as pd 
从datetime import datetime，time 
 import numpy as np 
 
 fn = r'00_Dart .csv'
 cols = ['UserID'，'StartTime'，'StopTime'，'gps1'，'gps2'] 
 
 df = pd.read_csv（fn，header = None， names = cols）
 
 #df ['m'] = df.StopTime + df.StartTime 
 #df ['d'] = df.StopTime  -  df.StartTime 
 
＃filter input data set ... 
 start_hour = 8 
 end_hour = 9 
 df = df [（pd.to_datetime（df.StartTime，unit ='s'）。 dt.hour> = start_hour）& （pd.to_datetime（df.StopTime，unit ='s'）。dt.hour <= end_hour）] 
 
 print df 
 
 df.to_csv（'time_hour。 csv'，index = False，header = False）

有一个可能性，控制分钟和秒也将是很好的解决方案。

目前，这还会删除具有 StopTime 小时的日志，还会删除下一个小时。

类似
start_hour = 8：0：0 end_hour = 9：0：0 - 1＃-1获取日志，直到8:59:59
但这给我一个错误
解决方案
请尝试：
import pandas as pd 从datetime import datetime，time import numpy as np fn = r'D：\data \gDrive\data\.stack.overflow\2016-07\dart_small.csv' cols = ['UserID'，'StartTime'，'StopTime'，'gps1'，'gps2'] df = pd.read_csv（fn，header = None，names = cols） df ['m'] = df.StopTime + df.StartTime df ['d'] = df.StopTime - df.StartTime ＃过滤器输入数据集... start_hour = 8 end_hour = 20 df = df [（pd.to_datetime（df.StartTime，unit ='s'）。dt.hour> = 8）& （pd.to_datetime（df.StartTime，unit ='s'）。dt.hour <= 20）] ＃'start'和'end' `r` ＃这将包含相等的间隔（在这种情况下为1小时） start = pd.to_datetime（df.StartTime.min（），unit ='s'）。date $ b end = pd.to_datetime（df.StopTime.max（），unit ='s'）。date（）+ pd.Timedelta（days = 1）＃building reporting DF：`r ` freq ='1H'＃1小时频率 idx = pd.date_range（start，end，freq = freq） r = pd.DataFrame（index = idx） r = r [（r.index.hour> = start_hour）& （r.index.hour <= end_hour）] r ['start'] =（r.index - pd.datetime（1970,1,1））。total_seconds（）。astype（np.int64）＃以秒为单位的1小时，减去1秒（这样我们就不会算两次） interval = 60 * 60 - 1 r ['LogCount'] = 0 r ['UniqueIDCount'] = 0 对于i，row在r.iterrows（）：＃间隔重叠测试＃https：// en .wikipedia.org / wiki / Interval_tree＃Overlap_test ＃我略微简化了m和d ＃的计算，通过除以2，＃，因为它可以消除通用术语 u = df [np.abs（df.m-2 * row.start-interval） df.d + interval] .UserID r.ix [i，['LogCount'，'UniqueIDCount']] = [len（u），u.nunique（）] r [ 'Date'] = pd.to_datetime（r.start，unit ='s'）。dt.date r ['Day'] = pd.to_datetime（r.start，unit ='s'）。 weekday_name.str [：3] r ['StartTime'] = pd.to_datetime（r.start，unit ='s'）。dt.time r ['EndTime'] = pd.to_datetime .start + interval + 1，unit ='s'）。dt.time ＃r.to_csv（'results.csv'，index = False） #print .LogCount> 0]） #print（r ['StartTime']，r ['EndTime']，r ['Day']，r ['LogCount']，r ['UniqueIDCount']） rout = r [['Date'，'StartTime'，'EndTime'，'Day'，'LogCount'，'UniqueIDCount']] #print rout
旧答案：
from_time = '08：00' to_time = '18：00' rout.between_time（from_time，to_time）.to_csv（'one_hour.csv'，index = False， header = False）
I am trying to pull the logs with respect to time slots. The program below runs very fine when no. of hours are given and the logs in that range gets extracted.
But now I also what to include Start and end to be dynamically given. i.e. say between 8 am to 8pm or 6am to 8am and so on. How do I get that? Any edit in the current program will also do or a separate program will also do. Input: Mini Version of INPUT Code: import pandas as pd from datetime import datetime,time import numpy as np fn = r'00_Dart.csv' cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2'] df = pd.read_csv(fn, header=None, names=cols) df['m'] = df.StopTime + df.StartTime df['d'] = df.StopTime - df.StartTime # 'start' and 'end' for the reporting DF: `r` # which will contain equal intervals (1 hour in this case) start = pd.to_datetime(df.StartTime.min(), unit='s').date() end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1) # building reporting DF: `r` freq = '1H' # 1 Hour frequency idx = pd.date_range(start, end, freq=freq) r = pd.DataFrame(index=idx) r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64) # 1 hour in seconds, minus one second (so that we will not count it twice) interval = 60*60 - 1 r['LogCount'] = 0 r['UniqueIDCount'] = 0 for i, row in r.iterrows(): # intervals overlap test # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test # i've slightly simplified the calculations of m and d # by getting rid of division by 2, # because it can be done eliminating common terms u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()] r['Date'] = pd.to_datetime(r.start, unit='s').dt.date r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3] r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time #r.to_csv('results.csv', index=False) #print(r[r.LogCount > 0]) #print (r['StartTime'], r['EndTime'], r['Day'], r['LogCount'], r['UniqueIDCount']) rout = r[['Date', 'StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ] #print rout rout.to_csv('one_hour.csv', index=False, header=False) Edit: In Simple words, I should be able to give StartTime and EndTIme in the program. The code below is very much close to what I am trying to do. But how convert this to pandas. from datetime import datetime,time start = time(8,0,0) end = time(20,0,0) with open('USC28days_0_20', 'r') as infile, open('USC28days_0_20_time','w') as outfile: for row in infile: col = row.split() t1 = datetime.fromtimestamp(float(col[2])).time() t2 = datetime.fromtimestamp(float(col[3])).time() print (t1 >= start and t2 <= end) Edit Two: Working answer in Pandas Taking a Part from the @MaxU's answer from selected answer. The below code strips the required group of logs between the given StartTime and StopTime import pandas as pd from datetime import datetime,time import numpy as np fn = r'00_Dart.csv' cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2'] df = pd.read_csv(fn, header=None, names=cols) #df['m'] = df.StopTime + df.StartTime #df['d'] = df.StopTime - df.StartTime # filter input data set ... start_hour = 8 end_hour = 9 df = df[(pd.to_datetime(df.StartTime, unit='s').dt.hour >= start_hour) & (pd.to_datetime(df.StopTime, unit='s').dt.hour <= end_hour)] print df df.to_csv('time_hour.csv', index=False, header=False) But: If there was a possibility to have control on minutes and seconds also would be great solution. At present this also strips the logs which have the hour of StopTime but also the minutes and seconds until the next hour. Something like start_hour = 8:0:0 end_hour = 9:0:0 - 1 # -1 to get the logs until 8:59:59 But this gives me an error 解决方案 try this: import pandas as pd from datetime import datetime,time import numpy as np fn = r'D:\data\gDrive\data\.stack.overflow\2016-07\dart_small.csv' cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2'] df = pd.read_csv(fn, header=None, names=cols) df['m'] = df.StopTime + df.StartTime df['d'] = df.StopTime - df.StartTime # filter input data set ... start_hour = 8 end_hour = 20 df = df[(pd.to_datetime(df.StartTime, unit='s').dt.hour >= 8) & (pd.to_datetime(df.StartTime, unit='s').dt.hour <= 20)] # 'start' and 'end' for the reporting DF: `r` # which will contain equal intervals (1 hour in this case) start = pd.to_datetime(df.StartTime.min(), unit='s').date() end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1) # building reporting DF: `r` freq = '1H' # 1 Hour frequency idx = pd.date_range(start, end, freq=freq) r = pd.DataFrame(index=idx) r = r[(r.index.hour >= start_hour) & (r.index.hour <= end_hour)] r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64) # 1 hour in seconds, minus one second (so that we will not count it twice) interval = 60*60 - 1 r['LogCount'] = 0 r['UniqueIDCount'] = 0 for i, row in r.iterrows(): # intervals overlap test # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test # i've slightly simplified the calculations of m and d # by getting rid of division by 2, # because it can be done eliminating common terms u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()] r['Date'] = pd.to_datetime(r.start, unit='s').dt.date r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3] r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time #r.to_csv('results.csv', index=False) #print(r[r.LogCount > 0]) #print (r['StartTime'], r['EndTime'], r['Day'], r['LogCount'], r['UniqueIDCount']) rout = r[['Date', 'StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ] #print rout OLD answer: from_time = '08:00' to_time = '18:00' rout.between_time(from_time, to_time).to_csv('one_hour.csv', index=False, header=False) 这篇关于如何包括动态时间？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何包括动态时间？ [英] How to include dynamic time?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何包括动态时间？ [英] How to include dynamic time?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭