从凸包中删除异常值 [英] Removing outliers from convex hull

查看:96
本文介绍了从凸包中删除异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用凸包可视化一些数据集(并从该凸包中获取一些统计信息).但是,每个数据集都包含一些噪声.因此,凸包不仅覆盖了主数据云中的所有点,而且还覆盖了所有离群值,这使得凸包的面积相当大,并且在数据集之间没有很大差异.数据集的示例如下:

I have a few datasets that I'd like to visualise with convex hull (and derive some statistics from that convex hull). However, each dataset contains some noise. Therefore, convex hull covers not only points in the main data cloud, but also all the outliers making the area of convex hull pretty large and not very different between datasets. An example of the dataset may be seen below:

整个区域不是单峰的,但是我们可以肯定地观察到一些异常的形状(特别是在左侧)弄乱了凸包的形状.估计的KDE如下所示:

The whole area is not unimodal, but we can certainly observe some outliers (especially on the left) that mess up convex hull shape. The estimated KDE looks like below:

因此,我想删除那些离群值.可以使用哪种算法将最小面积凸包拟合到数据集中的n-k个点,其中k设置为对应于给定观察百分比的某个数字?

Therefore, I'd like to remove those outliers. What algorithm could be used to fit minimal area convex hull to n - k points from the dataset, where k is set to some number respective to given percentage of observations?

请注意,图片是示例,实际上我正在处理大量不同的数据集

推荐答案

它在 R

set.seed(42)
#DATA
x = rnorm(20)
y = rnorm(20)

#Run convex hull
i = chull(x, y)

#Draw original data and convex hull
graphics.off()
plot(x, y, pch = 19, cex = 2)
polygon(x[i], y[i])

#Get coordinates of the center
x_c = mean(x)
y_c = mean(y)

#Calculate distance of each point from the center
d = sapply(seq_along(x), function(ind){
    dist(rbind(c(x_c, y_c), c(x[ind], y[ind])))
})

#Remove k points furthest from the center
k = 2
x2 = head(x[order(d)], -k)
y2 = head(y[order(d)], -k)
i2 = chull(x2, y2)

#Draw the smaller convex hull
points(x2, y2, pch = 19, col = "red")
polygon(x2[i2], y2[i2], border = "red", lty = 2)

这篇关于从凸包中删除异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆