围绕固定质心重整簇 [英] Refitting clusters around fixed centroids

查看:91
本文介绍了围绕固定质心重整簇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

聚类/分类问题:
使用k均值聚类生成这些聚类和质心:



这是具有以下特征的数据集:初始运行:

 > dput(sampledata)
structure(list(Player = structure(1:5,.Label = c( A, B, C,
D, E), class = factor),Metric.1 = c(0.3938961,0.28062338,
0.32532626,0.29239642,0.25622558),Metric.2 = c(0.00763359,
0.01172354,0.40550867,0.04026846,0.05976367) .3 = c(0.50766075,
0.20345662,0.06267444,0.08661417,0.17588925),cluster = c(1L,
2L,3L,2L,2L)),.names = c( Player, Metric.1, Metric.2,
Metric.3,群集),row.names = c(NA,-5L),class = data.frame)






这些是从3个指标中得出的集群详细信息:

 > dput(scluster)
structure(list(cluster = c(1L,2L,3L,2L,2L),center = structure(c(0.3938961,
0.276415126666667,0.32532626,0.00763359,0.03725189,0.40550867,
0.50766075、0.155320013333333、0.06267444)、. Dim = c(3L,3L)、. Dimnames = list(
c( 1, 2, 3),c( Metric.1 , Metric.2, Metric.3))))),
totss = 0.252759813332907,内插= c(0,0.00930902482096013,
0),tot.withinss = 0.00930902482096013,间插值= 0.243450788511947,
size = c(1L,3L,1L),iter = 1L,ifault = 0L),.names = c( cluster,
centers, totss, withinss, tot .withinss, betweenss,
size, iter, ifault),class = kmeans)


Clustering/classification problem: Used k-means clustering to generate these clusters and centroids:

This is the dataset with the added cluster attribute from the initial run:

  > dput(sampledata)
    structure(list(Player = structure(1:5, .Label = c("A", "B", "C", 
    "D", "E"), class = "factor"), Metric.1 = c(0.3938961, 0.28062338, 
    0.32532626, 0.29239642, 0.25622558), Metric.2 = c(0.00763359, 
    0.01172354, 0.40550867, 0.04026846, 0.05976367), Metric.3 = c(0.50766075, 
    0.20345662, 0.06267444, 0.08661417, 0.17588925), cluster = c(1L, 
    2L, 3L, 2L, 2L)), .Names = c("Player", "Metric.1", "Metric.2", 
    "Metric.3", "cluster"), row.names = c(NA, -5L), class = "data.frame")


These are the cluster details ran off the 3 metrics:

> dput (scluster)
structure(list(cluster = c(1L, 2L, 3L, 2L, 2L), centers = structure(c(0.3938961, 
0.276415126666667, 0.32532626, 0.00763359, 0.03725189, 0.40550867, 
0.50766075, 0.155320013333333, 0.06267444), .Dim = c(3L, 3L), .Dimnames = list(
    c("1", "2", "3"), c("Metric.1", "Metric.2", "Metric.3"))), 
    totss = 0.252759813332907, withinss = c(0, 0.00930902482096013, 
    0), tot.withinss = 0.00930902482096013, betweenss = 0.243450788511947, 
    size = c(1L, 3L, 1L), iter = 1L, ifault = 0L), .Names = c("cluster", 
"centers", "totss", "withinss", "tot.withinss", "betweenss", 
"size", "iter", "ifault"), class = "kmeans")

Data with cluster attribute and centroids

I aim to find a way to fix these centroids after the first cluster run for each cluster, such that these centroids can be used as fixed future references to see how these players move in and out of these clusters to different clusters if their metrics change, thereby tracking their progress or regress.

Specifically, if player A has changes in metrics such that it now resembles cluster 2 rather than 1, based on the Euclidean distance from the respective fixed centroids, I should be able to see player A move to cluster 2. This would mean the data points were refitted around these initially fixed centroids obtained from the first run.

This should help users to know how to approach such a data mining problem. Any pointers would be greatly appreciated! Thank you.

解决方案

Here you go:

# install a couple of packages needed for the example
library(devtools)
devtools::install_github("alexwhitworth/emclustr")
devtools::install_github("alexwhitworth/imputation")
library(emclustr)
library(imputation)

# generate some example data -- 30 points in 3 2-dimensional clusters
# clusters are MVN
set.seed(123)
x <- rbind(gen_clust(10, 2, c(-5,5), c(1,1)),
           gen_clust(10, 2, c(0,0), c(1,1)),
           gen_clust(10, 2, c(5,5), c(1,1)))

# get initial centroids
km <- kmeans(x, centers= 3)$centers

# generate a new set of example data, in this case a "subsequent step"
# from your time-series
x2 <- rbind(gen_clust(10, 2, c(-4,-4), c(1,1)),
           gen_clust(10, 2, c(1,1), c(1,1)),
           gen_clust(10, 2, c(4,4), c(1,1)))

# calculate the Euclidean distance of each point to each centroid
# and evaluate nearest distance
d_km <- as.data.frame(cbind(dist_q.matrix(x= rbind(km[1,], x2), ref= 1L, q=2),
              dist_q.matrix(x= rbind(km[2,], x2), ref= 1L, q=2),
              dist_q.matrix(x= rbind(km[3,], x2), ref= 1L, q=2)))
names(d_km) <- c("dist_centroid1", "dist_centroid2", "dist_centroid3")
d_km$clust <- apply(d_km, 1, which.min)

# plot the centroids and the new points "x2" to show the results
plot(km, pch= 11, xlim= c(-6,6), ylim= c(-6,6))
points(x2, col= factor(d_km$clust))

这篇关于围绕固定质心重整簇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆