为什么此内核密度估计的值超过1.0? [英] Why does this Kernel Density Estimation have values over 1.0?

查看:632
本文介绍了为什么此内核密度估计的值超过1.0?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试分析皮马印第安人糖尿病的特征绘制数据集(点击链接以获取数据集),方法是绘制数据集的概率密度分布.我尚未删除无效的0数据,因此这些图有时会在最左侧显示出偏差.在大多数情况下,分布看起来很准确:

I'm trying to analyse the features of the Pima Indians Diabetes Data Set (follow the link to get the dataset) by plotting their probability density distributions. I haven't yet removed invalid 0 data, so the plots sometimes show a bias at the very left. For the most part, the distributions look accurate:

我对DiabetesPedigree的图的外观有疑问,该图显示了超过1.0的概率(对于x〜,介于0.1和0.5之间).据我了解,合并概率应该等于1.0.

I have a problem with the look of the plot for DiabetesPedigree, which shows probabilities over 1.0 (for x ~ between 0.1 and 0.5). As I understand it, the combined probabilities should equal 1.0.

我已经隔离了DiatebesPedigree图的代码,但是通过更改dataset_index值,其他代码也可以使用:

I've isolated the code for the DiatebesPedigree plot but the same will work for the others by changing the dataset_index value:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

dataset_index = 6
feature_name = "DiabetesPedigree"
filename = 'pima-indians-diabetes.data.csv'

data = pd.read_csv(filename)
feature_data = data.ix[:, dataset_index]

graph_min = feature_data.min()
graph_max = feature_data.max()

density = gaussian_kde(feature_data)
density.covariance_factor = lambda : .25
density._compute_covariance()

xs = np.arange(graph_min, graph_max, (graph_max - graph_min)/200)
ys = density(xs)

plt.xlim(graph_min, graph_max)
plt.title(feature_name)
plt.plot(xs,ys)

plt.show()

推荐答案

正确标记为,连续pdf永远不会说该值小于1,而对于连续随机变量pdf,函数p(x)不是可能性.您可以参考连续的随机变量及其分布

As rightly marked , a continous pdf never says the value to be less than 1, with the pdf for continous random variable, function p(x) is not the probability. you can refer for continuous random varibales and their distrubutions

这篇关于为什么此内核密度估计的值超过1.0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆