当“类型"为“范数"时,如何计算ggplot stat_ellipse()的面积? [英] How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?
问题描述
当type ="norm"时,有什么方法可以计算此椭圆的面积吗?
Is there any way to calculate the area of this ellipse when type = "norm"?
默认值为type = "t"
. type = "norm"
显示不同的椭圆,因为它假定多元正态分布而不是多元t分布
Default is type = "t"
. type = "norm"
displays a different ellipse because it assumes a multivariate normal distribution instead of multivariate t-distribution
以下是代码和图解(使用与其他帖子类似的代码):
library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
ggplot (data, aes (x = x, y = y))+
geom_point()+
stat_ellipse(type = "norm")
上一个答案是:
#Plot object
p = ggplot (data, aes (x = x, y = y))+
geom_point()+
stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.
#Get ellipse coordinates from plot
pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]
# Center of ellipse
ctr = MASS::cov.trob(el)$center
# I tried changing this to 'stats::cov.wt' instead of 'MASS::cov.trob'
#from what is saw from (https://github.com/tidyverse/ggplot2/blob/master/R/stat-ellipse.R#L98)
# Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))
# Calculate area of ellipse from semi-major and semi-minor axes.
These are, respectively, the largest and smallest values of dist2center.
pi*min(dist2center)*max(dist2center)
更改为stats::cov.wt
不足以获取标准"椭圆的面积(计算出的值相同).关于如何更改代码有任何想法吗?
Changing to stats::cov.wt
wasn't enough to get the area of the "norm" ellipse (value calculated was the same). Any ideas on how to change the code?
谢谢!
推荐答案
很好的问题,我学到了一些东西.但是我无法重现您的问题,并且无法通过不同的方法获得(当然)不同的价值.
Nice question, I learned something. But I cannot reproduce your problem and get (of course) different values with the different approaches.
我认为链接答案中的方法不太正确,因为椭圆中心不是根据数据计算的,而是基于椭圆坐标的.我已更新,可以根据数据进行计算.
I think the approach in the linked answer is not quite correct because the ellipse center is not calculated with the data, but based on the ellipse coordinates. I have updated to calculate this based on the data.
library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
p_norm <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
stat_ellipse(type = "norm")
pb <- ggplot_build(p_norm)
el <- pb$data[[2]][c("x", "y")]
ctr <- MASS::cov.trob(data)$center #updated previous answer here
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))
pi * min(dist2center) * max(dist2center)
#> [1] 18.40872
由 reprex软件包(v0.3.0)创建于2020-02-27
更新,感谢Axeman的想法.
update thanks to Axeman for the thoughts.
可以通过首先计算特征值从协方差矩阵直接计算面积.您需要根据要获得的置信度来缩放方差/特征值. 此博客对我的理解有所帮助更好
The area can be directly calculated from the covariance matrix by calculating the eigenvalues first. You need to scale the variances / eigenvalues by the factor of confidence that you want to get. This blog helped me a lot to understand this a bit better
set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
cov_dat <- cov(dat) # covariance matrix
eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix
vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse
pi * vec[1] * vec[2]
#> [1] 18.38858
由 reprex软件包(v0.3.0)创建于2020-02-27
在这种情况下,协方差为零,特征值或多或少是变量的方差.因此,您可以仅使用方差进行计算. -假设两者都是正态分布.
In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814
由 reprex软件包(v0.3.0)创建于2020-02-27
因子5.991 代表数据95%置信度的卡方似然.有关更多信息,请参见此线程
The factor 5.991 represents the Chi-square likelihood for the 95% confidence of the data. For more information, see this thread
这篇关于当“类型"为“范数"时,如何计算ggplot stat_ellipse()的面积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!