加权Kmeans R [英] Weighted Kmeans R

查看:101
本文介绍了加权Kmeans R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对具有三个变量(列)的数据集(即Sample_Data)进行Kmeans聚类,如下所示:

I want to do a Kmeans clustering on a dataset (namely, Sample_Data) with three variables (columns) such as below:

     A  B  C
1    12 10 1
2    8  11 2
3    14 10 1
.    .   .  .
.    .   .  .
.    .   .  .

通常,在缩放列数并确定簇数之后,我将在R中使用此函数:

in a typical way, after scaling the columns, and determining the number of clusters, I will use this function in R:

Sample_Data <- scale(Sample_Data)
output_kmeans <- kmeans(Sample_Data, centers = 5, nstart = 50)

但是,如果对变量有偏爱怎么办?我的意思是,假设变量(列)A比其他两个变量更重要? 如何在模型中插入它们的权重? 谢谢大家

But, what if there is a preference for the variables? I mean that, suppose variable (column) A, is more important than the two other variables? how can I insert their weights in the model? Thank you all

推荐答案

您必须使用kmeans加权聚类,如flexclust软件包中提供的那样:

You have to use a kmeans weighted clustering, like the one presented in flexclust package:

https://cran.r-project.org/web /packages/flexclust/flexclust.pdf

功能

cclust(x, k, dist = "euclidean", method = "kmeans",
weights=NULL, control=NULL, group=NULL, simple=FALSE,
save.data=FALSE)

在数据矩阵上进行k均值聚类,艰苦的竞争性学习或神经毒气. weights拟合过程中要使用的可选权重向量.仅与艰苦的竞争性学习结合使用.

Perform k-means clustering, hard competitive learning or neural gas on a data matrix. weights An optional vector of weights to be used in the fitting process. Works only in combination with hard competitive learning.

使用虹膜数据的玩具示例:

A toy example using iris data:

library(flexclust)
data(iris)
cl <- cclust(iris[,-5], k=3, save.data=TRUE,weights =c(1,0.5,1,0.1),method="hardcl")
cl  
    kcca object of family ‘kmeans’ 

    call:
    cclust(x = iris[, -5], k = 3, method = "hardcl", weights = c(1, 0.5, 1, 0.1), save.data = TRUE)

    cluster sizes:

     1  2  3 
    50 59 41 

从cclust的输出中可以看到,使用竞争性学习,家庭永远是千里眼. 差异与训练阶段的群集分配有关:

As you can see from the output of cclust, also using competitive learning the family is always kmenas. The difference is related to cluster assignment during training phase:

如果方法是"kmeans",则由给出的经典kmeans算法 使用了MacQueen(1967),它通过重复移动所有群集来工作 中心到各自Voronoi集的均值.如果是"hardcl", 使用在线更新(又名艰苦的竞争性学习),该更新有效 通过从x随机绘制观察值并移动最接近的 中心指向该点(例如,Ripley,1996年).

If method is "kmeans", the classic kmeans algorithm as given by MacQueen (1967) is used, which works by repeatedly moving all cluster centers to the mean of their respective Voronoi sets. If "hardcl", on-line updates are used (AKA hard competitive learning), which work by randomly drawing an observation from x and moving the closest center towards that point (e.g., Ripley 1996).

weights参数只是一个数字序列,通常我使用介于0.01(最小权重)和1(最大权重)之间的数字.

The weights parameter is just a sequence of numbers, in general I use number between 0.01 (minimum weight) and 1 (maximum weight).

这篇关于加权Kmeans R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆