matplotlib如何计算直方图的密度 [英] How does matplotlib calculate the density for historgram

查看:337
本文介绍了matplotlib如何计算直方图的密度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通读matplotlib plt.hist文档,有一个可以设置为true的density参数.文档说

Reading through the matplotlib plt.hist documentations , there is a density parameter that can be set to true.The documentation says

density : bool, optional
            If ``True``, the first element of the return tuple will
            be the counts normalized to form a probability density, i.e.,
            the area (or integral) under the histogram will sum to 1.
            This is achieved by dividing the count by the number of
            observations times the bin width and not dividing by the total
            number of observations. If *stacked* is also ``True``, the sum of
            the histograms is normalized to 1.

This is achieved by dividing the count by the number of observations times the bin width and not dividing by the total number of observations

我尝试用示例数据复制它.

I tried replicating this with the sample data.

**Using matplotlib inbuilt calculations** .

ser = pd.Series(np.random.normal(size=1000))
ser.hist(density = 1,  bins=100)

**Manual calculation of the density** : 

arr_hist , edges = np.histogram( ser, bins =100)
samp = arr_hist / ser.shape[0] * np.diff(edges)
plt.bar(edges[0:-1] , samp )
plt.grid()

这两个图在y轴比例上完全不同,有人可以指出究竟出了什么问题以及如何手动复制密度计算吗?

Both the plots are completely different on the y-axis scales , could someone point what exactly is going wrong and how to replicate the density calculation manually ?

推荐答案

这是该语言的歧义.句子

That is an ambiguity in the language. The sentence

This is achieved by dividing the count by the number of observations times the bin width

需要像这样阅读

This is achieved by dividing (the count) by (the number of observations times the bin width)

count / (number of observations * bin width)

完整代码:

import numpy as np
import matplotlib.pyplot as plt

arr = np.random.normal(size=1000)

fig, (ax1, ax2) = plt.subplots(2)
ax1.hist(arr, density = True,  bins=100)
ax1.grid()


arr_hist , edges = np.histogram(arr, bins =100)
samp = arr_hist / (arr.shape[0] * np.diff(edges))
ax2.bar(edges[0:-1] , samp, width=np.diff(edges) )
ax2.grid()

plt.show()

这篇关于matplotlib如何计算直方图的密度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆