使用遗传算法优化K均值聚类 [英] Optimizing K-means clustering using Genetic Algorithm

查看:95
本文介绍了使用遗传算法优化K均值聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下dataset(在此处获得):

----------item survivalpoints weight
1  pocketknife             10      1
2        beans             20      5
3     potatoes             15     10
4       unions              2      1
5 sleeping bag             30      7
6         rope             10      5
7      compass             30      1

我可以使用二进制字符串作为我最初选择的中心,使用kmeans()将此数据集分为三个群集.例如:

I can cluster this dataset into three clusters with kmeans() using a binary string as my initial choice of centers. For eg:

## 1 represents the initial centers
chromosome = c(1,1,1,0,0,0,0)
## exclude first column (kmeans only support continous data)
cl <- kmeans(dataset[, -1], dataset[chromosome == 1, -1])
## check the memberships
cl$clusters
# [1] 1 3 3 1 2 1 2

使用这个基本概念,我在GA软件包中对其进行了尝试,以便在我试图优化(最小化)戴维斯-布尔登(DB)索引的地方进行搜索.

Using this fundamental concept, I tried it out with GA package to conduct the search where I am trying to optimize(minimize) Davies-Bouldin (DB) Index.

library(GA)           ## for ga() function
library(clusterSim)   ## for index.DB() function

## defining my fitness function (Davies-Bouldin)
DBI <- function(x) {
        ## converting matrix to vector to access each row
        binary_rep <- split(x, row(x))
        ## evaluate the fitness of each chromsome
        for(each in 1:nrow(x){
            cl <- kmeans(dataset, dataset[binary_rep[[each]] == 1, -1])
            dbi <- index.DB(dataset, cl$cluster, centrotypes = "centroids")
            ## minimizing db
            return(-dbi)
    }
}

g<- ga(type = "binary", fitness = DBI, popSize = 100, nBits = nrow(dataset))

当然(我不知道发生了什么),我收到了以下错误消息: Warning messages: Error in row(x) : a matrix-like object is required as argument to 'row'

Of course (I have no idea what's happening), I received error message of Warning messages: Error in row(x) : a matrix-like object is required as argument to 'row'

这是我的问题:

  1. 如何正确使用GA软件包来解决我的问题?
  2. 如何确定随机生成的染色体包含与k个簇数相对应的相同数量的1(例如,如果k=3则该染色体必须恰好包含三个1)?
  1. How can correctly use the GA package to solve my problem?
  2. How can I make sure the randomly generated chromosomes contains the same number of 1s which corresponds to k number of clusters (eg. if k=3 then the chromosome must contain exactly three 1s)?

推荐答案

我无法评论将k均值与ga相结合的感觉,但我可以指出,您的适应度函数存在问题.同样,当所有基因都打开或关闭时也会产生错误,因此仅在不是这种情况时才计算适合度:

I can't comment on the sense of combining k-means with ga, but I can point out that you had issue in your fitness function. Also, errors are produced when all genes are on or off, so fitness is only calculated when that is not the case:

DBI <- function(x) {
  if(sum(x)==nrow(dataset) | sum(x)==0){
    score <- 0
  } else {
    cl <- kmeans(dataset[, -1], dataset[x==1, -1])
    dbi <- index.DB(dataset[,-1], cl=cl$cluster, centrotypes = "centroids")
    score <- dbi$DB
  }

  return(score)
}

g <- ga(type = "binary", fitness = DBI, popSize = 100, nBits = nrow(dataset))
plot(g)

g@solution
g@fitnessValue

好像几种基因组合产生了相同的最佳"适应度值

Looks like several gene combinations produced the same "best" fitness value

这篇关于使用遗传算法优化K均值聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆