如何将曲线拟合到直方图 [英] How to fit a curve to a histogram
问题描述
我探讨了与此主题相关的类似问题,但在直方图上绘制漂亮的曲线时遇到了一些麻烦。我知道有些人可能会认为这是重复的,但是我目前没有找到任何东西可以解决我的问题。
尽管此处的数据不可见,但此处是我正在使用的一些变量,以便您可以在下面的代码中看到它们的含义。
差异< -subset(Score_Differences ,select =差,drop = T)
m =平均值(差异)
std = sqrt(var(差异))
这是我制作的第一条曲线(代码似乎最常见且最容易产生,但是曲线本身不太适合)。
hist(差异(密度,15 =断点= 15,概率= TRUE,xlab =得分差异,ylim = c(0,.1),main =得分差异的正态曲线)
curve(dnorm(x,m,std),col = Red,lwd = 2,add = TRUE)
我真的很喜欢,但是不喜欢曲线进入负区域。
hist(差异,概率= TRUE)
线(密度(差异),col =红色,lwd = 2)
线(密度(差异,调整= 2),lwd = 2,col =蓝色)
< a href = https://i.stack.imgur.com/3AfQY.jpg rel = nofollow noreferrer>
这与第一个直方图相同,但具有频率。
h = hist(差异,密度= 15,中断= 15,xlab =得分差异,主要=分数差异的正态曲线)
xfit = seq(min(差异),max(差异))
yfit = dnorm(xfit,m,std)
yfit = yfit * diff(h $ mids [1:2])* length(Differences)
行(xfit,yfit,col = Red,lwd = 2)
另一种尝试,但没有运气。可能是因为我使用的是 qnorm
,但数据显然不正常。曲线再次变为负方向。
sample_x = seq(qnorm(.001,m,std),qnorm(。 999,m,std),length.out = l)
binwidth = 3
breaks = seq(floor(min(Differences)),ceiling(max(Differences)),binwidth)
hist(差异,中断)
行(sample_x,l * dnorm(sample_x,m,std)* binwidth,col = Red)
唯一在视觉上看起来不错的曲线是第二,但是曲线落在负方向上。
我的问题是是否有标准方法在曲线上放置直方图? 这个数据当然是不正常的。我在这里介绍的程序中的3个来自类似的帖子,但是显然我遇到了一些麻烦。我觉得所有拟合曲线的方法都取决于您使用的数据。
更新解决方案
感谢李哲远和其他人!我将把它留给我自己,也希望其他人也能参考。
hist(差异,概率= TRUE)
行(密度(差异,剪切= 0),col =红色,lwd = 2)
行(密度(差异,调整= 2,剪切= 0),lwd = 2,col =蓝色 )
好的,因此您正在为密度
超出自然范围这一事实而苦苦挣扎。好吧,只需将 cut设置为0
。您可能想阅读
注意,通过 cut = 0
,密度估计严格在 range(x)
范围内进行。超出此范围,密度为0。
I've explored similar questions asked about this topic but I am having some trouble producing a nice curve on my histogram. I understand that some people may see this as a duplicate but I haven't found anything currently to help solve my problem.
Although the data isn't visible here, here is some variables I am using just so you can see what they represent in the code below.
Differences <- subset(Score_Differences, select = Difference, drop = T)
m = mean(Differences)
std = sqrt(var(Differences))
Here is the very first curve I produce (the code seems most common and easy to produce but the curve itself doesn't fit that well).
hist(Differences, density = 15, breaks = 15, probability = TRUE, xlab = "Score Differences", ylim = c(0,.1), main = "Normal Curve for Score Differences")
curve(dnorm(x,m,std),col = "Red", lwd = 2, add = TRUE)
I really like this but don't like the curve going into the negative region.
hist(Differences, probability = TRUE)
lines(density(Differences), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2), lwd = 2, col = "Blue")
This is the same histogram as the first, but with frequencies. Still doesn't look that nice.
h = hist(Differences, density = 15, breaks = 15, xlab = "Score Differences", main = "Normal Curve for Score Differences")
xfit = seq(min(Differences),max(Differences))
yfit = dnorm(xfit,m,std)
yfit = yfit*diff(h$mids[1:2])*length(Differences)
lines(xfit, yfit, col = "Red", lwd = 2)
Another attempt but no luck. Maybe because I am using qnorm
, when the data obviously isn't normal. The curve goes into the negative direction again.
sample_x = seq(qnorm(.001, m, std), qnorm(.999, m, std), length.out = l)
binwidth = 3
breaks = seq(floor(min(Differences)), ceiling(max(Differences)), binwidth)
hist(Differences, breaks)
lines(sample_x, l*dnorm(sample_x, m, std)*binwidth, col = "Red")
The only curve that visually looks nice is the 2nd, but the curve falls into the negative direction.
My question is "Is there a "standard way" to place a curve on a histogram?" This data certainly isn't normal. 3 of the procedures I presented here are from similar posts but I am having some troubles obviously. I feel like all methods of fitting a curve will depend on the data you're working with.
Update with solution
Thanks to Zheyuan Li and others! I will leave this up for my own reference and hopefully others as well.
hist(Differences, probability = TRUE)
lines(density(Differences, cut = 0), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2, cut = 0), lwd = 2, col = "Blue")
OK, so you are just struggling with the fact that density
goes beyond "natural range". Well, just set cut = 0
. You possibly want to read plot.density
extends "xlim" beyond the range of my data. Why and how to fix it? for why. In that answer, I was using from
and to
. But now I am using cut
.
## consider a mixture, that does not follow any parametric distribution family
## note, by construction, this is a strictly positive random variable
set.seed(0)
x <- rbeta(1000, 3, 5) + rexp(1000, 0.5)
## (kernel) density estimation offers a flexible nonparametric approach
d <- density(x, cut = 0)
## you can plot histogram and density on the density scale
hist(x, prob = TRUE, breaks = 50)
lines(d, col = 2)
Note, by cut = 0
, density estimation is done strictly within range(x)
. Outside this range, density is 0.
这篇关于如何将曲线拟合到直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!