获取数据库属性从k均值聚类WEKA [英] Getting Database Attribute From KMeans Clustering WEKA

查看:408
本文介绍了获取数据库属性从k均值聚类WEKA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有使用WEKA.jar创造K-means算法功能。我已经做了创建功能,并显示在我的控制台对象的列表。但是,我想说明的K-均值聚类特定的属性。

i have function that create k-means algorithm using WEKA.jar. I have done creating function and showing the list of object in my console. But, i want to show specific attribute from k-means clustering.

这是我的语法结果是:

//importing required dependencies
import weka.core.Instance;
import weka.experiment.InstanceQuery;

public class KMeans {

/*get connection strings from database manager*/
private DatabaseManager datman = new DatabaseManager();

private String username = datman.getUsername(); //get username
private String password = datman.getPassword(); //get password

public void doProcess(){
    int n = 3;
    String queries = "SELECT idms_kodebarang, aksesoris, bahan, `QTY-SA-1`,`QTY-SA-2`,`QTY-SA-3`,`QTY-SA-4`,`harga` FROM mt_karakterproduk";

    try {
        InstanceQuery query = new InstanceQuery();
        File reader = new File("DatabaseUtils.props");
        query.setUsername(username);
        query.setPassword(password);
        query.setQuery(queries);
        query.initialize(reader);
        query.setSparseData(true);
        Instances Data = query.retrieveInstances();

        String[] options = weka.core.Utils.splitOptions("-I 100");

        SimpleKMeans kmeans = new SimpleKMeans();
        kmeans.setSeed(10);
        kmeans.setOptions(options);
        //this is the important parameter to set
        kmeans.setNumClusters(n);
        kmeans.setPreserveInstancesOrder(true);
        kmeans.buildClusterer(Data);

        EuclideanDistance Dist = (EuclideanDistance)kmeans.getDistanceFunction();
        Instances instances = kmeans.getClusterCentroids();
        //create cluster information print result
        ClusterEvaluation eval = new ClusterEvaluation();
        eval.setClusterer(kmeans);

        for ( int i = 0; i < instances.numInstances(); i++ ) {
            // for each cluster center
            Instance inst = instances.instance( i );
            Double dist1 = Dist.distance(instances.firstInstance(), Data.instance(i));
            // as you mentioned, you only had 1 attribute
            // but you can iterate through the different attributes
            double value = inst.value( 0 );
            java.lang.System.out.println( "Value for centroid " + i + ": " + value + " ::: " +dist1);
        }

        java.lang.System.out.printf("Cluster Results \n =================== \n "+eval.clusterResultsToString());

        //this array returns the cluster number for each instance
        //the array has as many elements as the number of instances
        int[] assignments = kmeans.getAssignments();

        int i = 0;
        for(int clusternum : assignments){
            java.lang.System.out.printf("Instance %d - > cluster %d \n", i, clusternum);
            i++;
        }


    } catch (Exception e) {
        java.lang.System.out.println("Error On KMeans Analysis Exception : " + e.toString());
    }

}    

}

结果只显示列表:

  • 信息:实例0 - >集群2
  • 信息:实例2 - >集群2
  • 信息:实例4 - >集群1
  • 信息:实例6 - >集群2
  • 信息:实例8 - >集群2
  • 信息:实例10 - >集群1
  • 信息:实例12 - >集群2
  • 信息:实例14 - >簇0
  • 信息:实例16 - >集群1
  • 信息:实例18 ​​- >集群1
  • 信息:实例20 - >集群1
  • 信息:实例22 - >集群1
  • 信息:实例24 - >簇0
  • 信息:实例26 - >簇0
  • 信息:实例28 - >集群1
  • 信息:实例30 - >集群1 ...等。
  • INFO: Instance 0 - > cluster 2
  • INFO: Instance 2 - > cluster 2
  • INFO: Instance 4 - > cluster 1
  • INFO: Instance 6 - > cluster 2
  • INFO: Instance 8 - > cluster 2
  • INFO: Instance 10 - > cluster 1
  • INFO: Instance 12 - > cluster 2
  • INFO: Instance 14 - > cluster 0
  • INFO: Instance 16 - > cluster 1
  • INFO: Instance 18 - > cluster 1
  • INFO: Instance 20 - > cluster 1
  • INFO: Instance 22 - > cluster 1
  • INFO: Instance 24 - > cluster 0
  • INFO: Instance 26 - > cluster 0
  • INFO: Instance 28 - > cluster 1
  • INFO: Instance 30 - > cluster 1 ... etc..

我需要得到结果不仅实例字符串,但来自数据库的特定属性。所以结果是这样的(在我的WEKA应用程序)

i need to get result not only Instance string but specific attribute from database. so the result is like this (in my weka app)

 Cluster centroids:
                                   Cluster#
 Attribute              Full Data              0              1              2
                              (32)            (8)           (15)            (9)
  =============================================================================
  idms_kodebarang       E501245FF3       E613104F     E501247FF3     E501245FF3
  E501245FF3             1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E501247FF3             1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E820707F$KB            1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E820705F$KB            1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E5016B57FF             1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E5016B59FF             1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E820701F$KB            1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E613104F               1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  E820708F$KB            1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E521210F6              1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E5216B10F6             1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E501245C$3KB           1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E501247C$3KB           1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E501238FF3             1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E701601F               1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  E613105F               1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  E600201FC              1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E600105C               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E620201C               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E5016B57C$KB           1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E620501H               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E5016B59C$KB           1 (  3%)       0 (  0%)       0 (  0%)       1 ( 11%)
  E800601F               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E880201H               1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  E931301F               1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  G932201F$              1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  E840104FC              1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)
  E600300F               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E701104F               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E5016B50FF             1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E702201F               1 (  3%)       0 (  0%)       1 (  6%)       0 (  0%)
  E502415H6              1 (  3%)       1 ( 12%)       0 (  0%)       0 (  0%)

如何实现这一目标?

how to achieve this?

在此先感谢。

推荐答案

不知道现在该是相关的,但我希望它可以帮助别人有类似的问题。我与Weka中K-均值聚类API工作压力太大和ClusterEvaluation类应该给你你想要的输出形式。我试了一下虹膜数据集,得到的结果是这样的:

not sure if this is relevant now, but I hope that it helps someone with similar problem. I am working with the Weka K-Means clustering API too and the ClusterEvaluation class should give you the output in the form you want. I tried it on the Iris dataset and got the results as such:

Weka的工具K-均值聚类(套numOfClusters = 2)

Weka Tool K-Means Cluster (set numOfClusters = 2)

=== Run information ===

Scheme:       weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation:     iris
Instances:    150
Attributes:   5
              sepallength
              sepalwidth
              petallength
              petalwidth
              class
Test mode:    evaluate on training data


=== Clustering model (full training set) ===


kMeans
======

Number of iterations: 7
Within cluster sum of squared errors: 62.1436882815797

Initial starting points (random):

Cluster 0: 6.1,2.9,4.7,1.4,Iris-versicolor
Cluster 1: 6.2,2.9,4.3,1.3,Iris-versicolor

Missing values globally replaced with mean/mode

Final cluster centroids:
                                          Cluster#
Attribute                Full Data               0               1
                           (150.0)         (100.0)          (50.0)
==================================================================
sepallength                 5.8433           6.262           5.006
sepalwidth                   3.054           2.872           3.418
petallength                 3.7587           4.906           1.464
petalwidth                  1.1987           1.676           0.244
class                  Iris-setosa Iris-versicolor     Iris-setosa




Time taken to build model (full training data) : 0.02 seconds

=== Model and evaluation on training set ===

Clustered Instances

0      100 ( 67%)
1       50 ( 33%)

和使用ClusterEvaluation类使用Weka的API为相同的数据集我聚类器产生这样的结果:

And my clusterer using Weka API for the same dataset produced this result using the ClusterEvaluation class:

Cluster Evaluation results: 
kMeans
======

Number of iterations: 7
Within cluster sum of squared errors: 62.14368828157972

Initial starting points (random):

Cluster 0: 6.1,2.9,4.7,1.4,Iris-versicolor
Cluster 1: 6.2,2.9,4.3,1.3,Iris-versicolor

Missing values globally replaced with mean/mode

Final cluster centroids:
                                          Cluster#
Attribute                Full Data               0               1
                           (150.0)         (100.0)          (50.0)
==================================================================
sepallength                 5.8433           6.262           5.006
sepalwidth                   3.054           2.872           3.418
petallength                 3.7587           4.906           1.464
petalwidth                  1.1987           1.676           0.244
class                  Iris-setosa Iris-versicolor     Iris-setosa


Clustered Instances

0      100 ( 67%)
1       50 ( 33%)

我得到了上面的code。通过执行以下步骤:

I got the above code by performing the following steps:

Instances instances = new Instances("iris.arff");
SimpleKMeans simpleKMeans = new SimpleKMeans();

// build clusterer
simpleKMeans.setPreservationOrder(true);
simpleKMeans.setNumClusters(2);
simpleKMeans.buildClusterer(instances);

ClusterEvaluation eval = new ClusterEvaluation();
eval.setClusterer(simpleKMeans);
eval.evaluateClusterer(instances);

System.out.println("Cluster Evaluation: "+eval.clusterResultsToString());

的最后打印行打印所需的输出。希望这可以帮助别人。

The final print line prints the desired output. Hope this helps someone.

这篇关于获取数据库属性从k均值聚类WEKA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆