关于计算描述数据上限的函数的建议 [英] Advice on calculating a function to describe upper bound of data

查看:41
本文介绍了关于计算描述数据上限的函数的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集的散点图,我对计算数据的上限感兴趣.我不知道这是否是一种标准的统计方法,所以我正在考虑将 X 轴数据分成小范围,计算这些范围的最大值,然后尝试确定一个函数来描述这些点.R 中是否已有函数可以执行此操作?

I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this?

如果相关,有 92611 分.

If it's relevant there are 92611 points.

推荐答案

您可能想研究分位数回归,它可以在 quantreg 包.这是否有用将取决于您是否希望窗口"内的绝对最大值是某些极端分位数,例如第 95 位或第 99 位,是否可以接受?如果您不熟悉分位数回归,请考虑线性回归,它以模型协变量为条件,拟合期望或平均响应模型.中间分位数 (0.5) 的分位数回归将使模型拟合中值响应,条件是模型协变量.

You might like to look into quantile regression, which is available in the quantreg package. Whether this is useful will depend on whether you want the absolute maximum within your "windows" are whether some extreme quantile, say 95th or 99th, is acceptable? If you are not familiar with quantile regression, then consider the linear regression which fits a model for the expectation or mean response, conditional upon the model covariates. Quantile regression for the middle quantile (0.5) would fit a model to the median response, conditional upon the model covariates.

这是一个使用 quantreg 包的示例,向您展示我的意思.首先,生成一些类似于您显示的数据的虚拟数据:

Here is an example using the quantreg package, to show you what I mean. First, generate some dummy data similar to the data you show:

set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
                 X = seq_len(N))
plot(Y ~ X, data = DF)

接下来,将模型拟合到第 99 个百分位数(或 0.99 分位数):

Next, fit the model to the 99th percentile (or the 0.99 quantile):

mod <- rq(Y ~ log(X), data = DF, tau = .99)

为了生成拟合线",我们从模型中预测 X

To generate the "fitted line", we predict from the model at 100 equally spaced values in X

pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))

并将拟合模型添加到图中:

and add the fitted model to the plot:

lines(Y ~ X, data = pDF, col = "red", lwd = 2)

这应该给你:

这篇关于关于计算描述数据上限的函数的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆