在 pandas 中预测时间序列时的日期时间问题 [英] Datetime issues while time series predicting in Pandas

查看:127
本文介绍了在 pandas 中预测时间序列时的日期时间问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在python中实现时间序列预测模型,但遇到日期时间数据问题。

Trying to implement the model of time series predicting in python but facing with issues with datetime data.

所以我有一个带有两列日期时间的数据框'df'和浮点类型:

So I have a dataframe 'df' with two columns of datetime and float types:

然后我尝试使用values方法构建一个数组。但是有些奇怪的事情发生了,它以奇怪的格式显示了带有时间戳和时间的日期:

Then I try to build an array using values method. But smth strange happens and it displays the date in strange format with timestamps and time:

基本上,由于这个原因,我无法实现接收以下消息的模型,例如:不能在没有频率的情况下向时间戳添加整数值。

And basically because of it, I can not implement the model receiving the following messages for example:"Cannot add integral value to Timestamp without freq."

那么看来是什么问题,如何解决呢?

So what seems to be the problem and how can it be solved?

推荐答案

这很复杂。

首先,当创建 numpy 数组时,所有类型都是相同的。但是, datetime64 int 不同。因此,我们必须解决该问题。

First of all, when creating a numpy array, all types will be the same. However, datetime64 is not the same as int. So we'll have to resolve that, and we will.

第二,您尝试使用 df.values 。但是,这是有道理的,发生的是 pandas 将整个 df 转换为 dtype =对象,然后放入 object 数组。这样做的问题是 Timestamps 被保留为 Timestamps ,这会妨碍您的工作。

Second, you tried to do this with df.values. Which makes sense, however, what happens is that pandas makes the whole df into dtype=object then into an object array. The problem with that is that Timestamps get left as Timestamps which is getting in your way.

所以我将自己像这样转换它们

So I'd convert them on my own like this

a = np.column_stack([df[c].values.astype(int) for c in ['transaction_date', 'amount']])

a

array([[1454284800000000000,                   1],
       [1454371200000000000,                   2],
       [1454457600000000000,                   3],
       [1454544000000000000,                   4],
       [1454630400000000000,                   5]])

我们总是可以像这样转换后背的第一列

We can always convert the first column of a back like this

a[:, 0].astype(df.transaction_date.values.dtype)

array(['2016-02-01T00:00:00.000000000', '2016-02-02T00:00:00.000000000',
       '2016-02-03T00:00:00.000000000', '2016-02-04T00:00:00.000000000',
       '2016-02-05T00:00:00.000000000'], dtype='datetime64[ns]')

这篇关于在 pandas 中预测时间序列时的日期时间问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆