自动删除计算的聚集层次聚类数据的异常值 [英] Delete outliers automatically of a calculated agglomerative hierarchical clustering data

查看:56
本文介绍了自动删除计算的聚集层次聚类数据的异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在聚类分析中,可以通过单链接方法轻松识别数据集的异常值.现在,我想自动删除异常值.我的想法是删除超过指定距离值的数据.这是我的代码,其中包含mtcars的示例数据:

in the cluster analysis the outliers of a dataset can be easily identified by the single-linkage method. Now I would like to remove the outliers automatically. My idea is to remove the data which exceed a specified distance value. Here is my code with the example data of mtcars:

library(cluster)
library(dendextend)
cluster<-agnes(mtcars,stand=FALSE,method="single")
dend = as.dendrogram(cluster)

图解中,您可以看到生成的树状图.最后4辆车("Duster 360","Camaro Z28","Ford Pantera L","Maserati Bora")被识别为异常值,因此我想删除(数据集mtcars的)孔行.如何自动完成?例如.删除高度超过70的行?我尝试了很多方法来消除离群值,但它们似乎不适用于我的数据.

In the Plot you can see the resulting dendrogram. The last 4 cars ("Duster 360", "Camaro Z28", "Ford Pantera L", "Maserati Bora") are identified outliers so I would like to remove their hole rows(of the dataset mtcars). How can I do it automatically? E.g. remove the rows which height is above 70? I've tried a lot of possibilities to remove outliers but they did not seem to be applicable to my data.

非常感谢!

推荐答案

尝试一下:

# your code
library(cluster)
cluster<-agnes(mtcars,stand=FALSE,method="single")
dend = as.dendrogram(cluster)
plot(dend)

#new code    
hclu <- as.hclust(cluster) # convert to list that cutree() understands 
groupindexes <- cutree(hclu, h = 70) # cut at height 70 - creates 3 groups/branches
mtcars[groupindexes != 1,] # "outliers" - not in group 1 but in groups 2 and 3
mtcars[groupindexes == 1,] # all but the 4 "outliers"

结果1-异常值":

                mpg cyl disp  hp drat   wt  qsec vs am gear carb
Duster 360     14.3   8  360 245 3.21 3.57 15.84  0  0    3    4
Camaro Z28     13.3   8  350 245 3.73 3.84 15.41  0  0    3    4
Ford Pantera L 15.8   8  351 264 4.22 3.17 14.50  0  1    5    4
Maserati Bora  15.0   8  301 335 3.54 3.57 14.60  0  1    5    8

结果2:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
(....and ~30 other rows ....)

这篇关于自动删除计算的聚集层次聚类数据的异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆