密度为True时numpy.histogram的hist尺寸 [英] The dimensions in hist for numpy.histogram with density = True

查看:209
本文介绍了密度为True时numpy.histogram的hist尺寸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个数组A:

array([ 0.0019879 , -0.00172861, -0.00527226,  0.00639585, -0.00242005,
   -0.00717373,  0.00371651,  0.00164218,  0.00034572, -0.00864304,
   -0.00639585,  0.006828  ,  0.00354365,  0.00043215, -0.00440795,
    0.00544512,  0.00319793,  0.00164218,  0.00025929, -0.00155575,
    0.00129646,  0.00259291, -0.0039758 ,  0.00328436,  0.00207433,
    0.0011236 ,  0.00440795,  0.00164218, -0.00319793,  0.00233362,
    0.00025929,  0.00017286,  0.0008643 ,  0.00363008])

如果我跑步:

np.histogram(A, bins=9, density=True)

我得到的历史记录:

array([  34.21952021,   34.21952021,   34.21952021,   34.21952021,
     34.21952021,  188.20736116,  102.65856063,   68.43904042,
     51.32928032])

手册说:

如果为True,则结果为概率密度函数的值 在箱中进行归一化,以使该范围内的积分为1. 请注意,直方图值的总和将不等于1 除非选择了单位宽度的箱;这不是概率质量 功能."

"If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function."

我以为我对直方图和密度函数有很好的了解,但我真的不明白这些值代表什么或如何计算.

I thought I had a good understanding of histograms and density functions but I really don't understand what those values represent or how they are calculated.

由于我要在两种语言之间移植一些代码,因此我需要用R再现这些值.

I need to reproduce those values with R, as I am porting some code between the two languages.

推荐答案

在R中,可以使用hist()函数绘制直方图.另外,histS3函数,它会产生一个列表.

In R, you can use the hist() function to plot your histogram. Additionally, hist is an S3 function that produces a list.

A <- c(0.0019879 , -0.00172861, -0.00527226,  0.00639585, -0.00242005,
        -0.00717373,  0.00371651,  0.00164218,  0.00034572, -0.00864304,
        -0.00639585,  0.006828  ,  0.00354365,  0.00043215, -0.00440795,
        0.00544512,  0.00319793,  0.00164218,  0.00025929, -0.00155575,
        0.00129646,  0.00259291, -0.0039758 ,  0.00328436,  0.00207433,
        0.0011236 ,  0.00440795,  0.00164218, -0.00319793,  0.00233362,
        0.00025929,  0.00017286,  0.0008643 ,  0.00363008)

这是R与向量A一起产生的默认直方图.

Here is the default histogram produced by R with your vector A.

hist(A)

这里是直方图,其中密度曲线有一个附加层.

Here is the histogram with an additional layer for the density curve.

hist(A, freq = F)
lines(density(A), col = 'red')

让我们将列表hist(A)存储到p.

p <- hist(A)

我们现在可以看到列表p的内容.

We can now see the contents of the list p.

str(p)
# List of 6
#  $ breaks  : num [1:10] -0.01 -0.008 -0.006 -0.004 -0.002 0 0.002 0.004 # 0.006 0.008
#  $ counts  : int [1:9] 1 2 2 3 2 12 8 2 2
#  $ density : num [1:9] 14.7 29.4 29.4 44.1 29.4 ...
#  $ mids    : num [1:9] -0.009 -0.007 -0.005 -0.003 -0.001 0.001 0.003 0.005 0.007
#  $ xname   : chr "A"
#  $ equidist: logi TRUE
#  - attr(*, "class")= chr "histogram"

density是指理论密度函数值.它可以超过1,但是密度曲线下的面积应等于1.每个直方图的宽度很容易由直方图中直方图的断点之间的差(breaks)确定.因此,如果我们将直方图的每个条形的宽度乘以p$density,并将结果相加,则总和应为1.

The density refers to the theoretical density function value. This can exceed 1, but the area under the density curve should be equal to 1. The width of each bar is easily determined by the difference between the breakpoints (breaks) of the bars in the histogram. Thus, if we multiply the width of each bar of the histogram by the p$density, and add the results, we should get a sum of 1.

sum(diff(p$breaks) * p$density)
# [1] 1

这篇关于密度为True时numpy.histogram的hist尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆