趋势线绘图不适用于bigdataset [英] Trendline plotting not working with bigdataset
问题描述
我有一个包含52166个数据点的大型数据集,看起来像这样:
bc_conc
2010-04-09 10:00:00 609.542000
2010-04-09 11:00:00 663.500000
2010-04-09 12:00:00 524.661667
2010-04-09 13:00:00 228.706667
2010-04-09 14:00:00 279.721667
这是一个熊猫数据框,索引位于日期时间.现在,我想针对时间绘制bc_conc的数据并添加一条趋势线.
我使用了以下代码:
data = data.resample('M', closed='left', label='left').mean()
x1 = data.index
x2 = matplotlib.dates.date2num(data.index.to_pydatetime())
y = data.bc_conc
z = np.polyfit(x2, y, 1)
p = np.poly1d(z)
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
plt.plot_date(x=x1, y=y, fmt='b-')
plt.plot(x1, p(x2), 'ro')
plt.show()
但是,如您所见,我对数据进行了重新采样.我这样做是因为我没有,代码只是给了我一个没有趋势线的数据图.如果我将它们重新采样到几天,则情节仍然没有趋势线.如果我将它们重新采样到几个月,则会显示一条趋势线.
似乎该代码仅适用于较小的数据集.为什么是这样?我想知道任何人都可以向我解释这一点,因为我想将数据重采样到几天而不是更长时间.
预先感谢
无论使用每小时还是每天重新采样的数据,此代码都可以正常工作.
从100,000个数据点开始:
y = np.arange(0, 1000, .01) + np.random.normal(0, 100, 100000)
data = pd.DataFrame(data={'bc_conc': y}, index=pd.date_range(freq='H', start=datetime(2000, 1, 1), periods=len(y)))
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100000 entries, 2000-01-01 00:00:00 to 2011-05-29 15:00:00
Freq: H
Data columns (total 1 columns):
bc_conc 100000 non-null float64
dtypes: float64(1)
bc_conc
2000-01-01 00:00:00 -30.639811
2000-01-01 01:00:00 -26.791396
2000-01-01 02:00:00 -121.542718
2000-01-01 03:00:00 -69.267944
2000-01-01 04:00:00 117.731532
使用可选的重采样计算趋势线:
data = data.resample('D', closed='left', label='left').mean() # optional for daily data
x2 = matplotlib.dates.date2num(data.index.to_pydatetime()) # Dates to float representing (fraction of) days since 0001-01-01 00:00:00 UTC plus one
[ 730120. 730121. 730122. ..., 734284. 734285. 734286.]
z = np.polyfit(x2, data.bc_conc, 1)
[ 2.39988999e-01 -1.75220741e+05] # coefficients
p = np.poly1d(z)
0.24 x - 1.752e+05 # fitted polynomial
data['trend'] = p(x2) # trend from polynomial fit
bc_conc trend
2000-01-01 -29.794608 0.026983
2000-01-02 6.727729 0.266972
2000-01-03 9.815476 0.506961
2000-01-04 -27.954068 0.746950
2000-01-05 -13.726714 0.986939
data.plot()
plt.show()
收益:
I have a big dataset with 52166 datapoints and which looks like this:
bc_conc
2010-04-09 10:00:00 609.542000
2010-04-09 11:00:00 663.500000
2010-04-09 12:00:00 524.661667
2010-04-09 13:00:00 228.706667
2010-04-09 14:00:00 279.721667
It is a pandas dataframe and the index is on the datetime. Now I like to plot the data of bc_conc against the time and add a trendline.
I used the following code:
data = data.resample('M', closed='left', label='left').mean()
x1 = data.index
x2 = matplotlib.dates.date2num(data.index.to_pydatetime())
y = data.bc_conc
z = np.polyfit(x2, y, 1)
p = np.poly1d(z)
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
plt.plot_date(x=x1, y=y, fmt='b-')
plt.plot(x1, p(x2), 'ro')
plt.show()
However, as you can see I resampled my data. I did this because of I don't, the code just gives me a plot of the data without the trendline. If I resample them to days the plot is still without trendline. If I resample them to months, a trendline shows.
It seems as if the code only works for a smaller dataset. Why is this? I was wondering of anyone could explain this to me, because I like to resample my data to days, but not further..
Thanks in advance
This code seems to work fine, whether using hourly or daily resampled data.
Starting with 100,000 data points:
y = np.arange(0, 1000, .01) + np.random.normal(0, 100, 100000)
data = pd.DataFrame(data={'bc_conc': y}, index=pd.date_range(freq='H', start=datetime(2000, 1, 1), periods=len(y)))
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100000 entries, 2000-01-01 00:00:00 to 2011-05-29 15:00:00
Freq: H
Data columns (total 1 columns):
bc_conc 100000 non-null float64
dtypes: float64(1)
bc_conc
2000-01-01 00:00:00 -30.639811
2000-01-01 01:00:00 -26.791396
2000-01-01 02:00:00 -121.542718
2000-01-01 03:00:00 -69.267944
2000-01-01 04:00:00 117.731532
Calculation of trendline with optional resampling:
data = data.resample('D', closed='left', label='left').mean() # optional for daily data
x2 = matplotlib.dates.date2num(data.index.to_pydatetime()) # Dates to float representing (fraction of) days since 0001-01-01 00:00:00 UTC plus one
[ 730120. 730121. 730122. ..., 734284. 734285. 734286.]
z = np.polyfit(x2, data.bc_conc, 1)
[ 2.39988999e-01 -1.75220741e+05] # coefficients
p = np.poly1d(z)
0.24 x - 1.752e+05 # fitted polynomial
data['trend'] = p(x2) # trend from polynomial fit
bc_conc trend
2000-01-01 -29.794608 0.026983
2000-01-02 6.727729 0.266972
2000-01-03 9.815476 0.506961
2000-01-04 -27.954068 0.746950
2000-01-05 -13.726714 0.986939
data.plot()
plt.show()
Yields:
这篇关于趋势线绘图不适用于bigdataset的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!