自UTC午夜以来将数据时间转换为毫秒,或使用Pandas将数据时间本地化为CSV文件 [英] Convert datatime to milliseconds since midnight UTC or localized in CSV file using Pandas
问题描述
import pandas as pd
import numpy as np
from datetime import datetime, time
# history file and batch size for processing.
historyFilePath = 'EURUSD.SAMPLE.csv'
batch_size = 5000
# function for date parsing
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
# load data into a pandas iterator with all the chunks
ratesFromCSVChunks = pd.read_csv(historyFilePath, index_col=0, engine='python', parse_dates=True,
date_parser=dateparse, header=None,
names=["datetime", "1_Current", "2_BidPx", "3_BidSz", "4_AskPx", "5_AskSz"],
iterator=True,
chunksize=batch_size)
# concatenate chunks to get the final array
ratesFromCSV = pd.concat([chunk for chunk in ratesFromCSVChunks])
# save final csv file
df.to_csv('EURUSD_processed.csv', date_format='%Y-%m-%d %H:%M:%S.%f',
columns=['1_Current', '2_BidPx', '3_BidSz', '4_AskPx', '5_AskSz'], header=False, float_format='%.5f')
我正在读取包含
2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33930,1.00000
2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33950,1.00000
2014-08-17 17:00:02.000000,1.33910,1.33910,1.00000,1.33930,1.00000
2014-08-17 17:00:02.000000,1.33900,1.33900,1.00000,1.33940,1.00000
2014-08-17 17:00:04.000000,1.33910,1.33910,1.00000,1.33950,1.00000
2014-08-17 17:00:05.000000,1.33930,1.33930,1.00000,1.33950,1.00000
2014-08-17 17:00:06.000000,1.33920,1.33920,1.00000,1.33960,1.00000
2014-08-17 17:00:06.000000,1.33910,1.33910,1.00000,1.33950,1.00000
2014-08-17 17:00:08.000000,1.33900,1.33900,1.00000,1.33942,1.00000
2014-08-17 17:00:16.000000,1.33900,1.33900,1.00000,1.33940,1.00000
在保存时,如何将正在读取的CSV文件或熊猫数据框中的数据时间从MIDNIGHT(UTC或本地化)转换为MILLISECONDS中的EPOCH时间.每个文件每天从午夜开始.唯一要更改的是日期时间的格式,从每天的午夜起(到UTC或本地化),以毫秒为单位.我正在寻找的格式是:
How do you convert from Datatime in the CSV file or pandas dataframe being read to EPOCH time in MILLISECONDS from MIDNIGHT ( UTC or localized ) by the time it is being saved. Each file Starts at Midnight every day . The only thing being changed is the format of datetime to miilliseconds from midnight every day( UTC or localized) . The format i am looking for is:
43264234, 1.33910,1.33910,1.00000,1.33930,1.00000
43264739, 1.33910,1.33910,1.00000,1.33950,1.00000
43265282, 1.33910,1.33910,1.00000,1.33930,1.00000
43265789, 1.33900,1.33900,1.00000,1.33940,1.00000
43266318, 1.33910,1.33910,1.00000,1.33950,1.00000
43266846, 1.33930,1.33930,1.00000,1.33950,1.00000
43267353, 1.33920,1.33920,1.00000,1.33960,1.00000
43267872, 1.33910,1.33910,1.00000,1.33950,1.00000
43268387, 1.33900,1.33900,1.00000,1.33942,1.00000
我们非常感谢您的帮助(在Pandas 0.18.1和numpy 1.11中,在Python 3.5或Python 3.4及更高版本中简短准确)
Any help is well appreciated ( short & precise in Python 3.5 or Python 3.4 and above with Pandas 0.18.1 and numpy 1.11 )
推荐答案
此代码段应该是您想要的
This snippet of code should be what you want
# Create some fake data, similar to yours
import pandas as pd
s = pd.Series(pd.date_range('2014-08-17 17:00:01.1230000', periods=4))
print(s)
print(type(s[0]))
# Create a new series using just the date portion of the original data.
# This effectively truncates the time portion.
# Can't use d = s.dt.date or you'll get date objects back, not datetime64.
d = pd.to_datetime(s.dt.date)
print(d)
print(type(d[0]))
# Calculate the time delta between the original datetime and
# just the date portion. This is the elapsed time since your epoch.
delta_t = s-d
print(delta_t)
# Display the elapsed time as seconds.
print(delta_t.dt.total_seconds())
这将导致以下输出
0 2014-08-17 17:00:01.123
1 2014-08-18 17:00:01.123
2 2014-08-19 17:00:01.123
3 2014-08-20 17:00:01.123
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0 2014-08-17
1 2014-08-18
2 2014-08-19
3 2014-08-20
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0 17:00:01.123000
1 17:00:01.123000
2 17:00:01.123000
3 17:00:01.123000
dtype: timedelta64[ns]
0 61201.123
1 61201.123
2 61201.123
3 61201.123
dtype: float64
这篇关于自UTC午夜以来将数据时间转换为毫秒,或使用Pandas将数据时间本地化为CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!