散点图和LinearRegression的日期问题 [英] Date issue with scatter and LinearRegression

查看:37
本文介绍了散点图和LinearRegression的日期问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个问题,我相信两个问题都已发布为日期格式.

I have two issues and I believe both are released to the date format.

我有一个带有日期和值的简历:

I have a cvs with dates and values:

2012-01-03 00:00:00     95812    
2012-01-04 00:00:00    101265 
... 
2016-10-21 00:00:00     93594

在我使用 read_csv 加载后,我尝试使用以下方法解析日期:

after i load it with read_csv I'm trying to parse the date with:

X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise')

我也尝试过:

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse)

infer_datetime_format 参数.

它们似乎都可以正常工作,因为当我打印出来时,日期看起来像: 2012-01-03 .

All of them seems to work fine because when I print it out the date looks like: 2012-01-03.

当我尝试在图表上的数据绘制此行时出现问题:

The issue appears when I'm trying to plot the data on chart, this line:

ax.scatter(X.Dated, X.Val, c='green', marker='.')

给我一​​个错误:

TypeError: invalid type promotion

此外,当我尝试将其与LinearRegression()算法配合使用时,fit命令效果很好,但得分和预测给了我这个错误:

Also when I try to use it with LinearRegression() algorithm the fit command works fine but the score and predict gives me this error:

TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'

我尝试了很多修复它的方法,但是没有运气.任何帮助,将不胜感激.

I tried many things to fix it but with no luck. Any help would be appreciated.

推荐答案

ax.scatter (目前)不接受Pandas系列,但可以接受Pandas时间戳列表(例如 X ['Dated'].tolist())或dtype datetime64 [ns] 的NumPy数组(例如, X ['Dated'].values ):

ax.scatter (at the moment) does not accept Pandas Series, but it can accept a list of Pandas Timestamps (e.g. X['Dated'].tolist()), or NumPy array of dtype datetime64[ns] (e.g. X['Dated'].values):

import pandas as pd
import matplotlib.pyplot as plt

X = pd.DataFrame({'Dated': [pd.Timestamp('2012-01-03 00:00:00'),
                            pd.Timestamp('2012-01-04 00:00:00'),
                            pd.Timestamp('2016-10-21 00:00:00')],
                  'Val': [95812, 101265, 93594]})

fig, ax = plt.subplots()
# ax.scatter(X['Dated'].tolist(), X['Val'], c='green', marker='.', s=200)
ax.scatter(X['Dated'].values, X['Val'], c='green', marker='.', s=200)
plt.show()

在引擎盖下,ax.scatter 方法调用

x = self.convert_xunits(x)
y = self.convert_yunits(y)

处理类似日期的输入. convert_xunits 将NumPy datetime64数组转换为Matplotlib datenums,但会将Pandas时间序列转换为NumPy datetime64数组.

to handle date-like inputs. convert_xunits converts NumPy datetime64 arrays to Matplotlib datenums, but it converts Pandas timeseries to NumPy datetime64 array.

因此,当将熊猫时间序列作为输入传递给 ax.scatter 时,当

So, when a Pandas timeseries is passed as input to ax.scatter, the code ends up failing when this line is reached:

offsets = np.dstack((x, y))

np.dstack 尝试将其输入的dtype提升为一种常见的dtype.如果 x 的dtype为 datetime64 [ns] ,而 y 的dtype为 float64 ,则

np.dstack tries to promote the dtypes of its inputs to one common dtype. If x has dtype datetime64[ns] and y has dtype float64, then

TypeError: invalid type promotion

之所以被提出,是因为没有与之兼容的本地NumPy dtype.

is raised since there is no native NumPy dtype which is compatible with both.

这篇关于散点图和LinearRegression的日期问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆