Seaborn distplot y 轴归一化错误刻度标签 [英] Seaborn distplot y-axis normalisation wrong ticklabels

查看:84
本文介绍了Seaborn distplot y 轴归一化错误刻度标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请注意,我已经检查了

如您所见,y 轴刻度标签不在 [0,1] 范围内,正如预期的那样.打开/关闭 norm_histkde 不会改变这一点.作为参考,两者都关闭的输出:

只是为了验证:

aus = np.asarray(p0_dict['AUS'])aus_bins = np.histogram(aus, bins=11)[0]plt.subplot(121)plt.hist(aus,11)plt.subplot(122)plt.bar(range(0,11),aus_bins.astype(np.float)/np.sum(aus_bins))plt.show()

本例中的 y 刻度标签正确反映了归一化直方图的标签.

我做错了什么?

感谢您的帮助.

解决方案

y 轴是密度,而不是概率.我认为您期望归一化直方图显示概率质量函数,其中条形高度的总和等于 1.但这是错误的;归一化确保条形高度的总和 乘以条形宽度 等于 1.这确保了归一化直方图与核密度估计相当,核密度估计被归一化以便曲线下的面积等于 1.

Just to note, I have already checked this question and this question.

So, I'm using distplot to draw some histograms on separate subplots:

import numpy as np
#import netCDF4 as nc # used to get p0_dict
import matplotlib.pyplot as plt
from collections import OrderedDict
import seaborn.apionly as sns
import cPickle as pickle

''' 
LINK TO PICKLE
https://drive.google.com/file/d/0B8Xks3meeDq0aTFYcTZEZGFFVk0/view?usp=sharing
'''

p0_dict = pickle.load(open('/path/to/pickle/test.dat', 'r'))     

fig = plt.figure(figsize = (15,10))
ax = plt.gca()
j=1

for region, val in p0_dict.iteritems():

    val = np.asarray(val)

    subax = plt.subplot(5,5,j)

    print region

    try:              
        sns.distplot(val, bins=11, hist=True, kde=True, rug=True, 
                     ax = subax, color = 'k', norm_hist=True)

    except Exception as Ex:
        print Ex

    subax.set_title(region)
    subax.set_xlim(0, 1) # the data varies from 0 to 1

    j+=1    

plt.subplots_adjust(left = 0.06, right = 0.99, bottom = 0.07,
                    top = 0.92, wspace = 0.14, hspace = 0.6) 

fig.text(0.5, 0.02, r'$ P(W) = 0,1 $', ha ='center', fontsize = 15)
fig.text(0.02, 0.5, '% occurrence', ha ='center', 
         rotation='vertical', fontsize = 15) 
# obviously I'd multiply the fractional ticklabels by 100 to get 
# the percentage...

plt.show()

What I expect is for the area under the KDE curve to sum to 1, and for the y axis ticklabels to reflect this. However, I get the following:

As you can see, the y axis ticklabels are not in the range [0,1], as would be expected. Turning on/off norm_hist or kde does not change this. For reference, the output with both turned off:

Just to verify:

aus = np.asarray(p0_dict['AUS'])
aus_bins = np.histogram(aus, bins=11)[0]

plt.subplot(121)
plt.hist(aus,11)
plt.subplot(122)
plt.bar(range(0,11),aus_bins.astype(np.float)/np.sum(aus_bins))

plt.show()

The y ticklabels in this case properly reflect those of a normalised histogram.

What am I doing wrong?

Thank you for your help.

解决方案

The y axis is a density, not a probability. I think you are expecting the normalized histogram to show a probability mass function, where the sum the bar heights equals 1. But that's wrong; the normalization ensures that the sum of the bar heights times the bar widths equals 1. This is what ensures that the normalized histogram is comparable to the kernel density estimate, which is normalized so that the area under the curve is equal to 1.

这篇关于Seaborn distplot y 轴归一化错误刻度标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆