如何在直方图上绘制匹配的贝尔曲线? [英] How to draw a matching Bell curve over a histogram?

查看:68
本文介绍了如何在直方图上绘制匹配的贝尔曲线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

到目前为止我的代码,我对编程还很陌生,并且已经尝试了一段时间.

在这里,我应用

z1和z2的原始值非常类似于正态分布,因此黑线(数据的高斯正态)和绿线(KDE)非常相似.

当前代码首先计算数据的真实均值和真实标准差.由于您想模拟完美的高斯法线,您应该与均值为零和标准差为一的曲线进行比较.您会在情节中看到它们几乎相同.

My code so far, I'm very new to programming and have been trying for a while.

Here I apply the Box-Muller transform to approximate two Gaussian normal distributions starting from a random uniform sampling. Then, I create a histogram for both of them.

Now, I would like to compare the obtained histograms with "the real thing": a standard Bell curve. How to draw such a curve to match the histograms?

import numpy as np
import matplotlib.pyplot as plt

N = 10000
z1 = np.random.uniform(0, 1.0, N)
z2 = np.random.uniform(0, 1.0, N)

R_sq = -2 * np.log(z1)
theta = 2 * np.pi * z2
z1 = np.sqrt(R_sq) * np.cos(theta)
z2 = np.sqrt(R_sq) * np.sin(theta)

fig = plt.figure()
ax = fig.add_subplot(2, 1, 1)
ax.hist(z1, bins=40, range=(-4, 4), color='red')
plt.title("Histgram")
plt.xlabel("z1")
plt.ylabel("frequency")
ax2 = fig.add_subplot(2, 1, 2)
ax2.hist(z2, bins=40, range=(-4, 4), color='blue')
plt.xlabel("z2")
plt.show()

解决方案

To obtain the 'kernel density estimation', scipy.stats.gaussian_kde calculates a function to fit the data.

To just draw a Gaussian normal curve, there is [scipy.stats.norm]. Subtracting the mean and dividing by the standard deviation, adapts the position to the given data.

Both curves would be drawn such that the area below the curve sums to one. To adjust them to the size of the histogram, these curves need to be scaled by the length of the data times the bin-width. Alternatively, this scaling can stay at 1, and the histogram scaled by adding the parameter hist(..., density=True).

In the demo code the data is mutilated to illustrate the difference between the kde and the Gaussian normal.

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

x = np.linspace(-4,4,1000)
N = 10000
z1 = np.random.randint(1, 3, N) * np.random.uniform(0, .4, N)
z2 = np.random.uniform(0, 1, N)

R_sq = -2 * np.log(z1)
theta = 2 * np.pi * z2
z1 = np.sqrt(R_sq) * np.cos(theta)
z2 = np.sqrt(R_sq) * np.sin(theta)

fig = plt.figure(figsize=(12,4))
for ind_subplot, zi, col in zip((1, 2), (z1, z2), ('crimson', 'dodgerblue')):
    ax = fig.add_subplot(1, 2, ind_subplot)
    ax.hist(zi, bins=40, range=(-4, 4), color=col, label='histogram')
    ax.set_xlabel("z"+str(ind_subplot))
    ax.set_ylabel("frequency")

    binwidth = 8 / 40
    scale_factor = len(zi) * binwidth

    gaussian_kde_zi = stats.gaussian_kde(z1)
    ax.plot(x, gaussian_kde_zi(x)*scale_factor, color='springgreen', linewidth=3, label='kde')

    std_zi = np.std(zi)
    mean_zi = np.mean(zi)
    ax.plot(x, stats.norm.pdf((x-mean_zi)/std_zi)*scale_factor, color='black', linewidth=2, label='normal')
    ax.legend()

plt.show()

The original values for z1 and z2 very much resemble a normal distribution, and so the black line (the Gaussian normal for the data) and the green line (the KDE) very much resemble each other.

The current code first calculates the real mean and the real standard deviation of the data. As you want to mimic a perfect Gaussian normal, you should compare to the curve with mean zero and standard deviatio one. You'll see they're almost identical on the plot.

这篇关于如何在直方图上绘制匹配的贝尔曲线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆