散点图和LinearRegression的日期问题 [英] Date issue with scatter and LinearRegression
问题描述
我有两个问题,我相信两个问题都已发布为日期格式.
I have two issues and I believe both are released to the date format.
我有一个带有日期和值的简历:
I have a cvs with dates and values:
2012-01-03 00:00:00 95812
2012-01-04 00:00:00 101265
...
2016-10-21 00:00:00 93594
在我使用 read_csv
加载后,我尝试使用以下方法解析日期:
after i load it with read_csv
I'm trying to parse the date with:
X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise')
我也尝试过:
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse)
和 infer_datetime_format
参数.
它们似乎都可以正常工作,因为当我打印出来时,日期看起来像: 2012-01-03
.
All of them seems to work fine because when I print it out the date looks like: 2012-01-03
.
当我尝试在图表上的数据绘制此行时出现问题:
The issue appears when I'm trying to plot the data on chart, this line:
ax.scatter(X.Dated, X.Val, c='green', marker='.')
给我一个错误:
TypeError: invalid type promotion
此外,当我尝试将其与LinearRegression()算法配合使用时,fit命令效果很好,但得分和预测给了我这个错误:
Also when I try to use it with LinearRegression() algorithm the fit command works fine but the score and predict gives me this error:
TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'
我尝试了很多修复它的方法,但是没有运气.任何帮助,将不胜感激.
I tried many things to fix it but with no luck. Any help would be appreciated.
推荐答案
ax.scatter
(目前)不接受Pandas系列,但可以接受Pandas时间戳列表(例如 X ['Dated'].tolist()
)或dtype datetime64 [ns]
的NumPy数组(例如, X ['Dated'].values
):
ax.scatter
(at the moment) does not accept Pandas Series, but it can accept a list of Pandas Timestamps (e.g. X['Dated'].tolist()
), or NumPy array of dtype datetime64[ns]
(e.g. X['Dated'].values
):
import pandas as pd
import matplotlib.pyplot as plt
X = pd.DataFrame({'Dated': [pd.Timestamp('2012-01-03 00:00:00'),
pd.Timestamp('2012-01-04 00:00:00'),
pd.Timestamp('2016-10-21 00:00:00')],
'Val': [95812, 101265, 93594]})
fig, ax = plt.subplots()
# ax.scatter(X['Dated'].tolist(), X['Val'], c='green', marker='.', s=200)
ax.scatter(X['Dated'].values, X['Val'], c='green', marker='.', s=200)
plt.show()
x = self.convert_xunits(x)
y = self.convert_yunits(y)
处理类似日期的输入. convert_xunits
将NumPy datetime64数组转换为Matplotlib datenums,但会将Pandas时间序列转换为NumPy datetime64数组.
to handle date-like inputs. convert_xunits
converts NumPy datetime64 arrays to Matplotlib datenums, but it converts Pandas timeseries to NumPy datetime64 array.
因此,当将熊猫时间序列作为输入传递给 ax.scatter
时,当
So, when a Pandas timeseries is passed as input to ax.scatter
, the code ends up failing when this line is reached:
offsets = np.dstack((x, y))
np.dstack
尝试将其输入的dtype提升为一种常见的dtype.如果 x
的dtype为 datetime64 [ns]
,而 y
的dtype为 float64
,则
np.dstack
tries to promote the dtypes of its inputs to one common dtype. If x
has dtype datetime64[ns]
and y
has dtype float64
, then
TypeError: invalid type promotion
之所以被提出,是因为没有与之兼容的本地NumPy dtype.
is raised since there is no native NumPy dtype which is compatible with both.
这篇关于散点图和LinearRegression的日期问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!