如何利用pandas.read_csv()的date_parser参数 [英] How to utilise the date_parser parameter of pandas.read_csv()

查看:310
本文介绍了如何利用pandas.read_csv()的date_parser参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的csv文件中的timestamp列出现问题.

I am getting an issue with the timestamp column in my csv file.

ValueError:无法将字符串转换为float:'2020-02-21 22:00:00'

ValueError: could not convert string to float: '2020-02-21 22:00:00'

此行:

    import numpy as np
import pandas as pd
import matplotlib.pylab as plt 
from datetime import datetime
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")

"Import dataset"
df = pd.read_csv('fx_intraday_1min_GBP_USD.csv')


train, test = df[:-3], df[-3:]
scaler = MinMaxScaler()
scaler.fit(train) <----------- This line
train = scaler.transform(train)
test = scaler.transform(test)

n_input = 3
n_features = 4

generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)

model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizers='adam', loss='mse')
model.fit_generator(generator, epochs=180)

如何将timestamp列(最好是在读取csv时)转换为浮点数?

How can I convert the timestamp column (preferably when reading the csv) to a float?

链接到数据集: https://www.alphavantage.co/query?function=FX_INTRADAY& ; from_symbol = GBP& to_symbol = USD& interval = 1min& apikey = OF7SE183CNQLT9DW&datatype = csv

推荐答案

在读取数据时对CSV输入列执行转换

读入CSV数据,将转换应用于timestamp列以获取浮点值:

Performing Conversion On CSV Input Columns While Reading In The Data

Reading in CSV data applying conversion to the timestamp column to get float values:

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
...                  converters={'timestamp': 
...                                 lambda t: pd.Timestamp(t).timestamp()})
>>> df
       timestamp    open    high     low   close
0   1.582322e+09  1.2953  1.2964  1.2953  1.2964
1   1.582322e+09  1.2955  1.2957  1.2952  1.2957
2   1.582322e+09  1.2956  1.2958  1.2954  1.2957
3   1.582322e+09  1.2957  1.2958  1.2954  1.2957
4   1.582322e+09  1.2957  1.2958  1.2955  1.2956
..           ...     ...     ...     ...     ...
95  1.582317e+09  1.2966  1.2967  1.2964  1.2965
96  1.582317e+09  1.2967  1.2968  1.2965  1.2966
97  1.582317e+09  1.2965  1.2967  1.2964  1.2966
98  1.582317e+09  1.2964  1.2967  1.2962  1.2966
99  1.582316e+09  1.2963  1.2965  1.2961  1.2964

[100 rows x 5 columns]

这也可以应用于其他列. converters参数采用字典,键为列名,值为函数.

This can be applied to other columns too. The converters parameter takes a dictionary with the key being the column name and the value a function.

date_parser可能会很有用.回调可以从一列或多列中接收文本以进行处理. parse_dates参数可能需要与date_parser一起提供,以指示将回调应用于哪些列. date_parser只是列名称或索引的列表.用法示例:

date_parser could be useful if the timestamp data spans more than one column or is in some strange format. The callback can receive the text from one or more columns for processing. The parse_dates parameter may need to be supplied with date_parser to indicate which columns to apply the callback to. date_parser is just a list of the column names or indices. An example of usage:

df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
                 date_parser=lambda t: pd.Timestamp(t), 
                 parse_dates=['timestamp'])

没有日期/时间参数的

pd.read_csv()会生成类型为object的时间戳列.只需使用parse_dates指定哪一列是时间戳记,其他任何其他参数都无法解决:

pd.read_csv() with no date/time parameters produces a timestamp column of type object. Simply specifying which column is the timestamp using parse_dates and no other additional parameters fixes that:

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
                     parse_dates=['timestamp'])
>>> df.dtypes
timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64

以CSV格式读取后DataFrame列的转换

正如另一位用户建议的那样,还有另一种使用pd.to_datetime()转换列内容的方法.

Conversion of DataFrame Columns After Reading in CSV

As another user suggested, there's another way to convert the contents of a column using pd.to_datetime().

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>> df.dtypes
timestamp     object
open         float64
high         float64
low          float64
close        float64
dtype: object
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df.dtypes
timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64
dtype: object
>>> 
>>> df['timestamp'] = df['timestamp'].apply(lambda t: t.timestamp())
>>> df
       timestamp    open    high     low   close
0   1.582322e+09  1.2953  1.2964  1.2953  1.2964
1   1.582322e+09  1.2955  1.2957  1.2952  1.2957
2   1.582322e+09  1.2956  1.2958  1.2954  1.2957
3   1.582322e+09  1.2957  1.2958  1.2954  1.2957
4   1.582322e+09  1.2957  1.2958  1.2955  1.2956
..           ...     ...     ...     ...     ...
95  1.582317e+09  1.2966  1.2967  1.2964  1.2965
96  1.582317e+09  1.2967  1.2968  1.2965  1.2966
97  1.582317e+09  1.2965  1.2967  1.2964  1.2966
98  1.582317e+09  1.2964  1.2967  1.2962  1.2966
99  1.582316e+09  1.2963  1.2965  1.2961  1.2964

[100 rows x 5 columns]

或者在没有pd.to_datetime()的情况下一次性完成所有操作:

Or to do it all in one shot without pd.to_datetime():

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>>
>>> df['timestamp'] = df['timestamp'] \
...                      .apply(lambda t: pd.Timestamp(t).timestamp())
>>>

这篇关于如何利用pandas.read_csv()的date_parser参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆