在散点图中显示置信限和预测限 [英] Show confidence limits and prediction limits in scatter plot

查看:39
本文介绍了在散点图中显示置信限和预测限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据数组作为高度和重量:

import numpy as np, matplotlib.pyplot as plt高度 = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])权重 = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])plt.plot(高度,重量,'bo')plt.show()

我想制作与此类似的情节:

感谢任何想法.

解决方案

这是我整理的内容.我试着模仿你的截图.

给定

将 numpy 导入为 np将 scipy 导入为 sp导入 scipy.stats 作为统计信息导入 matplotlib.pyplot 作为 plt%matplotlib 内联# 原始数据高度 = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])权重 = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])

绘制置信区间的两个详细选项:

def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None):"使用简单的方法返回置信区间的轴.笔记-----.. 数学:: left|: hat{mu}_{y|x0} - mu_{y|x0} : 
ight|;leq ;T_{n-2}^{.975} ;hat{sigma} ;sqrt{frac{1}{n}+frac{(x_0-ar{x})^2}{sum_{i=1}^n{(x_i-ar{x})^2}}}.. 数学:: hat{sigma} = sqrt{sum_{i=1}^n{frac{(y_i-hat{y})^2}{n-2}}}参考----------.. [1] 杜阿尔特先生.曲线拟合",Jupyter 笔记本.http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/CurveFitting.ipynb"如果 ax 是 None:ax = plt.gca()ci = t * s_err * np.sqrt(1/n + (x2 - np.mean(x))**2/np.sum((x - np.mean(x))**2))ax.fill_between(x2, y2 + ci, y2 - ci, color="#b9cfe7", edgecolor="")返回斧头def plot_ci_bootstrap(xs, ys, resid, nboot=500, ax=None):"使用自举方法返回置信区间的轴.笔记-----bootstrap 方法迭代地重采样残差.它绘制了 `nboot` 条直线并勾勒出带的形状.重叠线的密度表明置信度提高.退货-------ax : 轴- 线群- 上下限(高低)(可选) 注意:对异常值敏感参考----------.. [1] J. Stults.可视化置信区间",各种后果.http://www.variousconsequences.com/2010/02/visualizing-confidence-intervals.html"如果 ax 是 None:ax = plt.gca()bootindex = sp.random.randint对于 _ 范围内(nboot):resamp_resid = resid[bootindex(0, len(resid) - 1, len(resid))]# 为多边形制作系数pc = sp.polyfit(xs, ys + resamp_resid, 1)# 绘制引导集群ax.plot(xs, sp.polyval(pc, xs), b-", linewidth=2, alpha=3.0/float(nboot))返回斧头

代码

# 计算 ---------------------------------------------------------x = 高度y = 权重# 用 Numpy 建模定义方程(a,b):"返回一个一维多项式.""返回 np.polyval(a, b)p, cov = np.polyfit(x, y, 1, cov=True) # 一维多项式拟合的参数和协方差.y_model = equation(p, x) # 使用拟合参数的模型;注意:这里的参数是系数# 统计数据n = weights.size # 观察次数m = p.size # 参数数量dof = n - m # 自由度t = stats.t.ppf(0.975, n - m) # 用于 CI 和 PI 波段# 数据/模型中的错误估计残留 = y - y_modelchi2 = np.sum((resid/y_model)**2) # 卡方;估计数据误差chi2_red = chi2/dof # 减少卡方;测量拟合优度s_err = np.sqrt(np.sum(resid**2)/dof) # 误差的标准差# 绘图 --------------------------------------------------------------------图, ax = plt.subplots(figsize=(8, 6))# 数据ax.plot(x,y,o",颜色=#b9cfe7",标记大小=8,markeredgewidth=1,markeredgecolor=b",markerfacecolor=None")# 合身ax.plot(x, y_model, "-", color="0.1", linewidth=1.5, alpha=0.5, label="Fit")x2 = np.linspace(np.min(x), np.max(x), 100)y2 = 方程(p, x2)# 置信区间(选择一项)plot_ci_manual(t, s_err, n, x, x2, y2, ax=ax)#plot_ci_bootstrap(x, y, resid, ax=ax)# 预测区间pi = t * s_err * np.sqrt(1 + 1/n + (x2 - np.mean(x))**2/np.sum((x - np.mean(x))**2))ax.fill_between(x2, y2 + pi, y2 - pi, color="None", linestyle="--")ax.plot(x2, y2 - pi, "--", color="0.5", label="95% Prediction Limits")ax.plot(x2,y2 + pi,--",颜色=0.5")#plt.show()

以下修改是可选,最初是为了模仿 OP 的预期结果而实现的.

# 图修改 --------------------------------------------------------# 边框ax.spines[top"].set_color(0.5")ax.spines[bottom"].set_color(0.5")ax.spines[left"].set_color(0.5")ax.spines[右"].set_color(0.5")ax.get_xaxis().set_tick_params(direction="out")ax.get_yaxis().set_tick_params(direction="out")ax.xaxis.tick_bottom()ax.yaxis.tick_left()# 标签plt.title(适合权重的图", fontsize=14", fontweight=bold")plt.xlabel(高度")plt.ylabel(重量")plt.xlim(np.min(x) - 1, np.max(x) + 1)# 自定义图例句柄,标签 = ax.get_legend_handles_labels()显示 = (0, 1)anyArtist = plt.Line2D((0, 1), (0, 0), color=#b9cfe7") # 创建自定义艺术家图例 = plt.图例([i 的句柄,如果 i 在显示中,则在 enumerate(handles) 中处理] + [anyArtist],[i 的标签,如果 i 在显示中,则在 enumerate(labels) 中的标签] + [95% 置信限度"],loc=9, bbox_to_anchor=(0, -0.21, 1., 0.102), ncol=3, mode=expand")frame = legend.get_frame().set_edgecolor("0.5")# 保存图plt.tight_layout()plt.savefig("filename.png", bbox_extra_artists=(legend,), bbox_inches="tight")plt.show()

输出

使用plot_ci_manual():

使用plot_ci_bootstrap():

希望这会有所帮助.干杯.


详情

  1. 我相信由于图例在图形之外,它不会出现在 matplotblib 的弹出窗口中.它在 Jupyter 中使用 %maplotlib inline 运行良好.

  2. 主要置信区间代码(plot_ci_manual())改编自另一个source 产生类似于 OP 的图.您可以通过取消注释来选择一种更高级的技术,称为 residual bootstrapping第二个选项 plot_ci_bootstrap().

更新

  1. 这篇博文已经更新,修改后的代码与 Python 3 兼容.
  2. stats.t.ppf() 接受较低的尾概率.根据以下资源,t = sp.stats.t.ppf(0.95, n - m) 更正为 t = sp.stats.t.ppf(0.975, n - m)) 来反映双边 95% t 统计量(或单边 97.5% t 统计量).
  3. y2 已更新,以更灵活地响应给定模型 (@regeneration).
  4. 添加了一个抽象的equation 函数来包装模型函数.尽管没有证明,但非线性回归是可能的.根据需要修改适当的变量(感谢@PJW).

另见

I have two arrays of data as hight and weight:

import numpy as np, matplotlib.pyplot as plt

heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])

plt.plot(heights,weights,'bo')
plt.show()

I want to produce the plot similiar to this:

http://www.sas.com/en_us/software/analytics/stat.html#m=screenshot6

Any ideas is appreciated.

解决方案

Here's what I put together. I tried to closely emulate your screenshot.

Given

import numpy as np
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt


%matplotlib inline

# Raw Data
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])

Two detailed options to plot confidence intervals:

def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None):
    """Return an axes of confidence bands using a simple approach.
    
    Notes
    -----
    .. math:: left| : hat{mu}_{y|x0} - mu_{y|x0} : 
ight| ; leq ; T_{n-2}^{.975} ; hat{sigma} ; sqrt{frac{1}{n}+frac{(x_0-ar{x})^2}{sum_{i=1}^n{(x_i-ar{x})^2}}}
    .. math:: hat{sigma} = sqrt{sum_{i=1}^n{frac{(y_i-hat{y})^2}{n-2}}}
    
    References
    ----------
    .. [1] M. Duarte.  "Curve fitting," Jupyter Notebook.
       http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/CurveFitting.ipynb
    
    """
    if ax is None:
        ax = plt.gca()
    
    ci = t * s_err * np.sqrt(1/n + (x2 - np.mean(x))**2 / np.sum((x - np.mean(x))**2))
    ax.fill_between(x2, y2 + ci, y2 - ci, color="#b9cfe7", edgecolor="")

    return ax


def plot_ci_bootstrap(xs, ys, resid, nboot=500, ax=None):
    """Return an axes of confidence bands using a bootstrap approach.

    Notes
    -----
    The bootstrap approach iteratively resampling residuals.
    It plots `nboot` number of straight lines and outlines the shape of a band.
    The density of overlapping lines indicates improved confidence.

    Returns
    -------
    ax : axes
        - Cluster of lines
        - Upper and Lower bounds (high and low) (optional)  Note: sensitive to outliers

    References
    ----------
    .. [1] J. Stults. "Visualizing Confidence Intervals", Various Consequences.
       http://www.variousconsequences.com/2010/02/visualizing-confidence-intervals.html

    """ 
    if ax is None:
        ax = plt.gca()

    bootindex = sp.random.randint

    for _ in range(nboot):
        resamp_resid = resid[bootindex(0, len(resid) - 1, len(resid))]
        # Make coeffs of for polys
        pc = sp.polyfit(xs, ys + resamp_resid, 1)                   
        # Plot bootstrap cluster
        ax.plot(xs, sp.polyval(pc, xs), "b-", linewidth=2, alpha=3.0 / float(nboot))

    return ax

Code

# Computations ----------------------------------------------------------------    
x = heights
y = weights

# Modeling with Numpy
def equation(a, b):
    """Return a 1D polynomial."""
    return np.polyval(a, b) 

p, cov = np.polyfit(x, y, 1, cov=True)                     # parameters and covariance from of the fit of 1-D polynom.
y_model = equation(p, x)                                   # model using the fit parameters; NOTE: parameters here are coefficients

# Statistics
n = weights.size                                           # number of observations
m = p.size                                                 # number of parameters
dof = n - m                                                # degrees of freedom
t = stats.t.ppf(0.975, n - m)                              # used for CI and PI bands

# Estimates of Error in Data/Model
resid = y - y_model                           
chi2 = np.sum((resid / y_model)**2)                        # chi-squared; estimates error in data
chi2_red = chi2 / dof                                      # reduced chi-squared; measures goodness of fit
s_err = np.sqrt(np.sum(resid**2) / dof)                    # standard deviation of the error

# Plotting --------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8, 6))

# Data
ax.plot(
    x, y, "o", color="#b9cfe7", markersize=8, 
    markeredgewidth=1, markeredgecolor="b", markerfacecolor="None"
)

# Fit
ax.plot(x, y_model, "-", color="0.1", linewidth=1.5, alpha=0.5, label="Fit")  

x2 = np.linspace(np.min(x), np.max(x), 100)
y2 = equation(p, x2)

# Confidence Interval (select one)
plot_ci_manual(t, s_err, n, x, x2, y2, ax=ax)
#plot_ci_bootstrap(x, y, resid, ax=ax)
   
# Prediction Interval
pi = t * s_err * np.sqrt(1 + 1/n + (x2 - np.mean(x))**2 / np.sum((x - np.mean(x))**2))   
ax.fill_between(x2, y2 + pi, y2 - pi, color="None", linestyle="--")
ax.plot(x2, y2 - pi, "--", color="0.5", label="95% Prediction Limits")
ax.plot(x2, y2 + pi, "--", color="0.5")

#plt.show()

The following modifications are optional, originally implemented to mimic the OP's desired result.

# Figure Modifications --------------------------------------------------------
# Borders
ax.spines["top"].set_color("0.5")
ax.spines["bottom"].set_color("0.5")
ax.spines["left"].set_color("0.5")
ax.spines["right"].set_color("0.5")
ax.get_xaxis().set_tick_params(direction="out")
ax.get_yaxis().set_tick_params(direction="out")
ax.xaxis.tick_bottom()
ax.yaxis.tick_left() 

# Labels
plt.title("Fit Plot for Weight", fontsize="14", fontweight="bold")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.xlim(np.min(x) - 1, np.max(x) + 1)

# Custom legend
handles, labels = ax.get_legend_handles_labels()
display = (0, 1)
anyArtist = plt.Line2D((0, 1), (0, 0), color="#b9cfe7")    # create custom artists
legend = plt.legend(
    [handle for i, handle in enumerate(handles) if i in display] + [anyArtist],
    [label for i, label in enumerate(labels) if i in display] + ["95% Confidence Limits"],
    loc=9, bbox_to_anchor=(0, -0.21, 1., 0.102), ncol=3, mode="expand"
)  
frame = legend.get_frame().set_edgecolor("0.5")

# Save Figure
plt.tight_layout()
plt.savefig("filename.png", bbox_extra_artists=(legend,), bbox_inches="tight")

plt.show()

Output

Using plot_ci_manual():

Using plot_ci_bootstrap():

Hope this helps. Cheers.


Details

  1. I believe that since the legend is outside the figure, it does not show up in matplotblib's popup window. It works fine in Jupyter using %maplotlib inline.

  2. The primary confidence interval code (plot_ci_manual()) is adapted from another source producing a plot similar to the OP. You can select a more advanced technique called residual bootstrapping by uncommenting the second option plot_ci_bootstrap().

Updates

  1. This post has been updated with revised code compatible with Python 3.
  2. stats.t.ppf() accepts the lower tail probability. According to the following resources, t = sp.stats.t.ppf(0.95, n - m) was corrected to t = sp.stats.t.ppf(0.975, n - m) to reflect a two-sided 95% t-statistic (or one-sided 97.5% t-statistic).
  3. y2 was updated to respond more flexibly with a given model (@regeneration).
  4. An abstracted equation function was added to wrap the model function. Non-linear regressions are possible although not demonstrated. Amend appropriate variables as needed (thanks @PJW).

See Also

  • This post on plotting bands with statsmodels library.
  • This tutorial on plotting bands and computing confidence intervals with uncertainties library (install with caution in a separate environment).

这篇关于在散点图中显示置信限和预测限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆