如何利用pandas.read_csv()的date_parser参数 [英] How to utilise the date_parser parameter of pandas.read_csv()
问题描述
我的csv文件中的timestamp
列出现问题.
I am getting an issue with the timestamp
column in my csv file.
ValueError:无法将字符串转换为float:'2020-02-21 22:00:00'
ValueError: could not convert string to float: '2020-02-21 22:00:00'
此行:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
from datetime import datetime
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")
"Import dataset"
df = pd.read_csv('fx_intraday_1min_GBP_USD.csv')
train, test = df[:-3], df[-3:]
scaler = MinMaxScaler()
scaler.fit(train) <----------- This line
train = scaler.transform(train)
test = scaler.transform(test)
n_input = 3
n_features = 4
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizers='adam', loss='mse')
model.fit_generator(generator, epochs=180)
如何将timestamp
列(最好是在读取csv时)转换为浮点数?
How can I convert the timestamp
column (preferably when reading the csv) to a float?
推荐答案
在读取数据时对CSV输入列执行转换
读入CSV数据,将转换应用于timestamp列以获取浮点值:
Performing Conversion On CSV Input Columns While Reading In The Data
Reading in CSV data applying conversion to the timestamp column to get float values:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
... converters={'timestamp':
... lambda t: pd.Timestamp(t).timestamp()})
>>> df
timestamp open high low close
0 1.582322e+09 1.2953 1.2964 1.2953 1.2964
1 1.582322e+09 1.2955 1.2957 1.2952 1.2957
2 1.582322e+09 1.2956 1.2958 1.2954 1.2957
3 1.582322e+09 1.2957 1.2958 1.2954 1.2957
4 1.582322e+09 1.2957 1.2958 1.2955 1.2956
.. ... ... ... ... ...
95 1.582317e+09 1.2966 1.2967 1.2964 1.2965
96 1.582317e+09 1.2967 1.2968 1.2965 1.2966
97 1.582317e+09 1.2965 1.2967 1.2964 1.2966
98 1.582317e+09 1.2964 1.2967 1.2962 1.2966
99 1.582316e+09 1.2963 1.2965 1.2961 1.2964
[100 rows x 5 columns]
这也可以应用于其他列. converters
参数采用字典,键为列名,值为函数.
This can be applied to other columns too. The converters
parameter takes a dictionary with the key being the column name and the value a function.
date_parser
可能会很有用.回调可以从一列或多列中接收文本以进行处理. parse_dates
参数可能需要与date_parser
一起提供,以指示将回调应用于哪些列. date_parser
只是列名称或索引的列表.用法示例:
date_parser
could be useful if the timestamp data spans more than one column or is in some strange format. The callback can receive the text from one or more columns for processing. The parse_dates
parameter may need to be supplied with date_parser
to indicate which columns to apply the callback to. date_parser
is just a list of the column names or indices. An example of usage:
df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
date_parser=lambda t: pd.Timestamp(t),
parse_dates=['timestamp'])
没有日期/时间参数的
pd.read_csv()
会生成类型为object
的时间戳列.只需使用parse_dates
指定哪一列是时间戳记,其他任何其他参数都无法解决:
pd.read_csv()
with no date/time parameters produces a timestamp column of type object
. Simply specifying which column is the timestamp using parse_dates
and no other additional parameters fixes that:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
parse_dates=['timestamp'])
>>> df.dtypes
timestamp datetime64[ns]
open float64
high float64
low float64
close float64
以CSV格式读取后DataFrame列的转换
正如另一位用户建议的那样,还有另一种使用pd.to_datetime()
转换列内容的方法.
Conversion of DataFrame Columns After Reading in CSV
As another user suggested, there's another way to convert the contents of a column using pd.to_datetime()
.
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>> df.dtypes
timestamp object
open float64
high float64
low float64
close float64
dtype: object
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df.dtypes
timestamp datetime64[ns]
open float64
high float64
low float64
close float64
dtype: object
>>>
>>> df['timestamp'] = df['timestamp'].apply(lambda t: t.timestamp())
>>> df
timestamp open high low close
0 1.582322e+09 1.2953 1.2964 1.2953 1.2964
1 1.582322e+09 1.2955 1.2957 1.2952 1.2957
2 1.582322e+09 1.2956 1.2958 1.2954 1.2957
3 1.582322e+09 1.2957 1.2958 1.2954 1.2957
4 1.582322e+09 1.2957 1.2958 1.2955 1.2956
.. ... ... ... ... ...
95 1.582317e+09 1.2966 1.2967 1.2964 1.2965
96 1.582317e+09 1.2967 1.2968 1.2965 1.2966
97 1.582317e+09 1.2965 1.2967 1.2964 1.2966
98 1.582317e+09 1.2964 1.2967 1.2962 1.2966
99 1.582316e+09 1.2963 1.2965 1.2961 1.2964
[100 rows x 5 columns]
或者在没有pd.to_datetime()
的情况下一次性完成所有操作:
Or to do it all in one shot without pd.to_datetime()
:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>>
>>> df['timestamp'] = df['timestamp'] \
... .apply(lambda t: pd.Timestamp(t).timestamp())
>>>
这篇关于如何利用pandas.read_csv()的date_parser参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!