R:弧形曲线下的面积? [英] R: area under curve of ogive?

查看:311
本文介绍了R:弧形曲线下的面积?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一种算法,该算法使用经过排序的y数据的x,y图来生成ogive.

I have an algorithm that uses an x,y plot of sorted y data to produce an ogive.

然后我得出曲线下的面积以得出%.

I then derive the area under the curve to derive %'s.

我想使用核密度估计做类似的事情.我喜欢如何使用内核密度来平滑上下边界(即,最小值和最大值将略微超出我的硬编码输入).

I'd like to do something similar using kernel density estimation. I like how the upper/lower bounds are smoothed out using kernel densities (i.e. the min and max will extend slightly beyond my hard coded input).

这两种方式...我想知道是否有一种方法可以将ogive视为一种累积分布函数和/或使用核密度估计来得出给定y数据的累积分布函数?

Either way... I was wondering if there is a way to treat an ogive as a type of cumulative distribution function and/or use kernel density estimation to derive a cumulative distribution function given y data?

如果这是一个令人困惑的问题,我深表歉意.我知道有一种方法可以得出累积频率图(即ogive).但是,鉴于此累积频率图,我无法确定如何得出%.

I apologize if this is a confusing question. I know there is a way to derive a cumulative frequency graph (i.e. ogive). However, I can't determine how to derive a % given this cumulative frequency graph.

我不想要的是ecdf.我知道该怎么做,但我并不是很想捕获ecdf.但是,宁可整合给定的两个间隔.

What I don't want is an ecdf. I know how to do that, and I am not quite trying to capture an ecdf. But, rather integration of an ogive given two intervals.

推荐答案

我不确定您要记住的是什么,但是这是一种用于计算曲线下面积以进行内核密度估计的方法(或更普遍地,在任何情况下,如果您的y值位于等距的x值处(当然,当然也可以泛化为可变的x间隔):

I'm not exactly sure what you have in mind, but here's a way to calculate the area under the curve for a kernel density estimate (or more generally for any case where you have the y values at equally spaced x-values (though you can, of course, generalize to variable x intervals as well)):

library(zoo)

# Kernel density estimate
# Set n to higher value to get a finer grid
set.seed(67839)
dens = density(c(rnorm(500,5,2),rnorm(200,20,3)), n=2^5)

# How to extract the x and y values of the density estimate
#dens$y
#dens$x

# x interval
dx = median(diff(dens$x))

# mean height for each pair of y values
h = rollmean(dens$y, 2)

# Area under curve
sum(h*dx)  # 1.000943

# Cumulative area
# cumsum(h*dx)

# Plot density, showing points at which density is calculated 
plot(dens)
abline(v=dens$x, col="#FF000060", lty="11")

# Plot cumulative area under curve, showing mid-point of each x-interval
plot(dens$x[-length(dens$x)] + 0.5*dx, cumsum(h*dx), type="l")
abline(v=dens$x[-length(dens$x)] + 0.5*dx, col="#FF000060", lty="11")

更新以包含ecdf函数

UPDATE to include ecdf function

要解决您的意见,请看下面的两个图.第一个是我上面使用的正态分布混合的经验累积分布函数(ECDF).请注意,该数据的图形在下方与上方相同.第二个是普通香草正态分布的ECDF图,均值= 0,sd = 1.

To address your comments, look at the two plots below. The first is the empirical cumulative distribution function (ECDF) of the mixture of normal distributions that I used above. Note that the plot of this data looks the same below as it does above. The second is a plot of the ECDF of a plain vanilla normal distribution, mean=0, sd=1.

set.seed(67839)
x = c(rnorm(500,5,2),rnorm(200,20,3))
plot(ecdf(x), do.points=FALSE)

plot(ecdf(rnorm(1000)))

这篇关于R:弧形曲线下的面积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆