在散点图中显示置信限和预测限 [英] Show confidence limits and prediction limits in scatter plot
问题描述
我有两个数据数组作为高度和重量:
import numpy as np, matplotlib.pyplot as plt高度 = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])权重 = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])plt.plot(高度,重量,'bo')plt.show()
我想制作与此类似的情节:
感谢任何想法.
这是我整理的内容.我试着模仿你的截图.
给定
将 numpy 导入为 np将 scipy 导入为 sp导入 scipy.stats 作为统计信息导入 matplotlib.pyplot 作为 plt%matplotlib 内联# 原始数据高度 = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])权重 = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
绘制置信区间的两个详细选项:
def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None):"使用简单的方法返回置信区间的轴.笔记-----.. 数学:: left|: hat{mu}_{y|x0} - mu_{y|x0} :
ight|;leq ;T_{n-2}^{.975} ;hat{sigma} ;sqrt{frac{1}{n}+frac{(x_0-ar{x})^2}{sum_{i=1}^n{(x_i-ar{x})^2}}}.. 数学:: hat{sigma} = sqrt{sum_{i=1}^n{frac{(y_i-hat{y})^2}{n-2}}}参考----------.. [1] 杜阿尔特先生.曲线拟合",Jupyter 笔记本.http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/CurveFitting.ipynb"如果 ax 是 None:ax = plt.gca()ci = t * s_err * np.sqrt(1/n + (x2 - np.mean(x))**2/np.sum((x - np.mean(x))**2))ax.fill_between(x2, y2 + ci, y2 - ci, color="#b9cfe7", edgecolor="")返回斧头def plot_ci_bootstrap(xs, ys, resid, nboot=500, ax=None):"使用自举方法返回置信区间的轴.笔记-----bootstrap 方法迭代地重采样残差.它绘制了 `nboot` 条直线并勾勒出带的形状.重叠线的密度表明置信度提高.退货-------ax : 轴- 线群- 上下限(高低)(可选) 注意:对异常值敏感参考----------.. [1] J. Stults.可视化置信区间",各种后果.http://www.variousconsequences.com/2010/02/visualizing-confidence-intervals.html"如果 ax 是 None:ax = plt.gca()bootindex = sp.random.randint对于 _ 范围内(nboot):resamp_resid = resid[bootindex(0, len(resid) - 1, len(resid))]# 为多边形制作系数pc = sp.polyfit(xs, ys + resamp_resid, 1)# 绘制引导集群ax.plot(xs, sp.polyval(pc, xs), b-", linewidth=2, alpha=3.0/float(nboot))返回斧头
代码
# 计算 ---------------------------------------------------------x = 高度y = 权重# 用 Numpy 建模定义方程(a,b):"返回一个一维多项式.""返回 np.polyval(a, b)p, cov = np.polyfit(x, y, 1, cov=True) # 一维多项式拟合的参数和协方差.y_model = equation(p, x) # 使用拟合参数的模型;注意:这里的参数是系数# 统计数据n = weights.size # 观察次数m = p.size # 参数数量dof = n - m # 自由度t = stats.t.ppf(0.975, n - m) # 用于 CI 和 PI 波段# 数据/模型中的错误估计残留 = y - y_modelchi2 = np.sum((resid/y_model)**2) # 卡方;估计数据误差chi2_red = chi2/dof # 减少卡方;测量拟合优度s_err = np.sqrt(np.sum(resid**2)/dof) # 误差的标准差# 绘图 --------------------------------------------------------------------图, ax = plt.subplots(figsize=(8, 6))# 数据ax.plot(x,y,o",颜色=#b9cfe7",标记大小=8,markeredgewidth=1,markeredgecolor=b",markerfacecolor=None")# 合身ax.plot(x, y_model, "-", color="0.1", linewidth=1.5, alpha=0.5, label="Fit")x2 = np.linspace(np.min(x), np.max(x), 100)y2 = 方程(p, x2)# 置信区间(选择一项)plot_ci_manual(t, s_err, n, x, x2, y2, ax=ax)#plot_ci_bootstrap(x, y, resid, ax=ax)# 预测区间pi = t * s_err * np.sqrt(1 + 1/n + (x2 - np.mean(x))**2/np.sum((x - np.mean(x))**2))ax.fill_between(x2, y2 + pi, y2 - pi, color="None", linestyle="--")ax.plot(x2, y2 - pi, "--", color="0.5", label="95% Prediction Limits")ax.plot(x2,y2 + pi,--",颜色=0.5")#plt.show()
以下修改是可选,最初是为了模仿 OP 的预期结果而实现的.
# 图修改 --------------------------------------------------------# 边框ax.spines[top"].set_color(0.5")ax.spines[bottom"].set_color(0.5")ax.spines[left"].set_color(0.5")ax.spines[右"].set_color(0.5")ax.get_xaxis().set_tick_params(direction="out")ax.get_yaxis().set_tick_params(direction="out")ax.xaxis.tick_bottom()ax.yaxis.tick_left()# 标签plt.title(适合权重的图", fontsize=14", fontweight=bold")plt.xlabel(高度")plt.ylabel(重量")plt.xlim(np.min(x) - 1, np.max(x) + 1)# 自定义图例句柄,标签 = ax.get_legend_handles_labels()显示 = (0, 1)anyArtist = plt.Line2D((0, 1), (0, 0), color=#b9cfe7") # 创建自定义艺术家图例 = plt.图例([i 的句柄,如果 i 在显示中,则在 enumerate(handles) 中处理] + [anyArtist],[i 的标签,如果 i 在显示中,则在 enumerate(labels) 中的标签] + [95% 置信限度"],loc=9, bbox_to_anchor=(0, -0.21, 1., 0.102), ncol=3, mode=expand")frame = legend.get_frame().set_edgecolor("0.5")# 保存图plt.tight_layout()plt.savefig("filename.png", bbox_extra_artists=(legend,), bbox_inches="tight")plt.show()
输出
使用plot_ci_manual()
:
使用plot_ci_bootstrap()
:
希望这会有所帮助.干杯.
详情
我相信由于图例在图形之外,它不会出现在 matplotblib 的弹出窗口中.它在 Jupyter 中使用
%maplotlib inline
运行良好.主要置信区间代码(
plot_ci_manual()
)改编自另一个source 产生类似于 OP 的图.您可以通过取消注释来选择一种更高级的技术,称为 residual bootstrapping第二个选项plot_ci_bootstrap()
.
更新
- 这篇博文已经更新,修改后的代码与 Python 3 兼容.
stats.t.ppf()
接受较低的尾概率.根据以下资源,t = sp.stats.t.ppf(0.95, n - m)
更正为t = sp.stats.t.ppf(0.975, n - m))
来反映双边 95% t 统计量(或单边 97.5% t 统计量).- 原始笔记本和方程式
- 统计参考(感谢@Bonlenfum 和@tryptofan)
- 经过验证的 t 值给定
dof=17
一个>
y2
已更新,以更灵活地响应给定模型 (@regeneration).- 添加了一个抽象的
equation
函数来包装模型函数.尽管没有证明,但非线性回归是可能的.根据需要修改适当的变量(感谢@PJW).
另见
- 这篇关于使用
statsmodels
> 图书馆. - 本教程关于使用
绘制波段和计算置信区间不确定性
库(在单独的环境中谨慎安装).
I have two arrays of data as hight and weight:
import numpy as np, matplotlib.pyplot as plt
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
plt.plot(heights,weights,'bo')
plt.show()
I want to produce the plot similiar to this:
http://www.sas.com/en_us/software/analytics/stat.html#m=screenshot6
Any ideas is appreciated.
Here's what I put together. I tried to closely emulate your screenshot.
Given
import numpy as np
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
# Raw Data
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
Two detailed options to plot confidence intervals:
def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None):
"""Return an axes of confidence bands using a simple approach.
Notes
-----
.. math:: left| : hat{mu}_{y|x0} - mu_{y|x0} :
ight| ; leq ; T_{n-2}^{.975} ; hat{sigma} ; sqrt{frac{1}{n}+frac{(x_0-ar{x})^2}{sum_{i=1}^n{(x_i-ar{x})^2}}}
.. math:: hat{sigma} = sqrt{sum_{i=1}^n{frac{(y_i-hat{y})^2}{n-2}}}
References
----------
.. [1] M. Duarte. "Curve fitting," Jupyter Notebook.
http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/CurveFitting.ipynb
"""
if ax is None:
ax = plt.gca()
ci = t * s_err * np.sqrt(1/n + (x2 - np.mean(x))**2 / np.sum((x - np.mean(x))**2))
ax.fill_between(x2, y2 + ci, y2 - ci, color="#b9cfe7", edgecolor="")
return ax
def plot_ci_bootstrap(xs, ys, resid, nboot=500, ax=None):
"""Return an axes of confidence bands using a bootstrap approach.
Notes
-----
The bootstrap approach iteratively resampling residuals.
It plots `nboot` number of straight lines and outlines the shape of a band.
The density of overlapping lines indicates improved confidence.
Returns
-------
ax : axes
- Cluster of lines
- Upper and Lower bounds (high and low) (optional) Note: sensitive to outliers
References
----------
.. [1] J. Stults. "Visualizing Confidence Intervals", Various Consequences.
http://www.variousconsequences.com/2010/02/visualizing-confidence-intervals.html
"""
if ax is None:
ax = plt.gca()
bootindex = sp.random.randint
for _ in range(nboot):
resamp_resid = resid[bootindex(0, len(resid) - 1, len(resid))]
# Make coeffs of for polys
pc = sp.polyfit(xs, ys + resamp_resid, 1)
# Plot bootstrap cluster
ax.plot(xs, sp.polyval(pc, xs), "b-", linewidth=2, alpha=3.0 / float(nboot))
return ax
Code
# Computations ----------------------------------------------------------------
x = heights
y = weights
# Modeling with Numpy
def equation(a, b):
"""Return a 1D polynomial."""
return np.polyval(a, b)
p, cov = np.polyfit(x, y, 1, cov=True) # parameters and covariance from of the fit of 1-D polynom.
y_model = equation(p, x) # model using the fit parameters; NOTE: parameters here are coefficients
# Statistics
n = weights.size # number of observations
m = p.size # number of parameters
dof = n - m # degrees of freedom
t = stats.t.ppf(0.975, n - m) # used for CI and PI bands
# Estimates of Error in Data/Model
resid = y - y_model
chi2 = np.sum((resid / y_model)**2) # chi-squared; estimates error in data
chi2_red = chi2 / dof # reduced chi-squared; measures goodness of fit
s_err = np.sqrt(np.sum(resid**2) / dof) # standard deviation of the error
# Plotting --------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8, 6))
# Data
ax.plot(
x, y, "o", color="#b9cfe7", markersize=8,
markeredgewidth=1, markeredgecolor="b", markerfacecolor="None"
)
# Fit
ax.plot(x, y_model, "-", color="0.1", linewidth=1.5, alpha=0.5, label="Fit")
x2 = np.linspace(np.min(x), np.max(x), 100)
y2 = equation(p, x2)
# Confidence Interval (select one)
plot_ci_manual(t, s_err, n, x, x2, y2, ax=ax)
#plot_ci_bootstrap(x, y, resid, ax=ax)
# Prediction Interval
pi = t * s_err * np.sqrt(1 + 1/n + (x2 - np.mean(x))**2 / np.sum((x - np.mean(x))**2))
ax.fill_between(x2, y2 + pi, y2 - pi, color="None", linestyle="--")
ax.plot(x2, y2 - pi, "--", color="0.5", label="95% Prediction Limits")
ax.plot(x2, y2 + pi, "--", color="0.5")
#plt.show()
The following modifications are optional, originally implemented to mimic the OP's desired result.
# Figure Modifications --------------------------------------------------------
# Borders
ax.spines["top"].set_color("0.5")
ax.spines["bottom"].set_color("0.5")
ax.spines["left"].set_color("0.5")
ax.spines["right"].set_color("0.5")
ax.get_xaxis().set_tick_params(direction="out")
ax.get_yaxis().set_tick_params(direction="out")
ax.xaxis.tick_bottom()
ax.yaxis.tick_left()
# Labels
plt.title("Fit Plot for Weight", fontsize="14", fontweight="bold")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.xlim(np.min(x) - 1, np.max(x) + 1)
# Custom legend
handles, labels = ax.get_legend_handles_labels()
display = (0, 1)
anyArtist = plt.Line2D((0, 1), (0, 0), color="#b9cfe7") # create custom artists
legend = plt.legend(
[handle for i, handle in enumerate(handles) if i in display] + [anyArtist],
[label for i, label in enumerate(labels) if i in display] + ["95% Confidence Limits"],
loc=9, bbox_to_anchor=(0, -0.21, 1., 0.102), ncol=3, mode="expand"
)
frame = legend.get_frame().set_edgecolor("0.5")
# Save Figure
plt.tight_layout()
plt.savefig("filename.png", bbox_extra_artists=(legend,), bbox_inches="tight")
plt.show()
Output
Using plot_ci_manual()
:
Using plot_ci_bootstrap()
:
Hope this helps. Cheers.
Details
I believe that since the legend is outside the figure, it does not show up in matplotblib's popup window. It works fine in Jupyter using
%maplotlib inline
.The primary confidence interval code (
plot_ci_manual()
) is adapted from another source producing a plot similar to the OP. You can select a more advanced technique called residual bootstrapping by uncommenting the second optionplot_ci_bootstrap()
.
Updates
- This post has been updated with revised code compatible with Python 3.
stats.t.ppf()
accepts the lower tail probability. According to the following resources,t = sp.stats.t.ppf(0.95, n - m)
was corrected tot = sp.stats.t.ppf(0.975, n - m)
to reflect a two-sided 95% t-statistic (or one-sided 97.5% t-statistic).- original notebook and equation
- statistics reference (thanks @Bonlenfum and @tryptofan)
- verified t-value given
dof=17
y2
was updated to respond more flexibly with a given model (@regeneration).- An abstracted
equation
function was added to wrap the model function. Non-linear regressions are possible although not demonstrated. Amend appropriate variables as needed (thanks @PJW).
See Also
- This post on plotting bands with
statsmodels
library. - This tutorial on plotting bands and computing confidence intervals with
uncertainties
library (install with caution in a separate environment).
这篇关于在散点图中显示置信限和预测限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!