打印群ID,并使用星火KMEANS算法中的元素。 [英] Printing ClusterID and its elements using Spark KMeans algo.

查看:231
本文介绍了打印群ID,并使用星火KMEANS算法中的元素。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个计划,打印在Apache的火花的Kmeans的MSSE算法。有20所产生的集群。我要打印的群ID,并且得到了分配给各群ID的元素。我该如何遍历所有的群ID打印的元素。

感谢你们!!

  VAL SC =新SparkContext(本地,KMeansExample,在/ usr /本地/火花/目录(目标/斯卡拉-2.10 / kmeans_2.10-1.0 。罐))
            //加载和分析数据
            VAL数据= sc.textFile(kmeans.csv)
         VAL parsedData = Data.Map中(S => Vectors.dense(s.split(,)的地图(_ toDouble))。)        //集群数据转换成使用KMEANS两类
        VAL numIterations = 20
        VAL numClusters = 20
        VAL集群= KMeans.train(parsedData,numClusters,numIterations)
        VAL clusterCenters = clusters.clusterCenters地图(_.toArray)
        的println(以下简称聚类中心=+ clusterCenters)
        //通过计算中设置的平方和错误的评估集群
        VAL WSSSE = clusters.computeCost(parsedData)
        的println(在误差平方的总和设置=+ WSSSE)


解决方案

我知道你应该运行predict每个元素。

  KMeansModel集群= KMeans.train(parsedData.rdd(),numClusters,numIterations);    清单<向量>矢量= parsedData.collect();
    对于(矢量向量:向量){
        的System.out.println(集群+集群predict(矢量)++与Vector.toString());
    }

I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements.

Thank you guys!!

           val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
            // Load and parse the data
            val data = sc.textFile("kmeans.csv")
         val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))

        // Cluster the data into two classes using KMeans
        val numIterations = 20
        val numClusters = 20
        val clusters = KMeans.train(parsedData, numClusters, numIterations)
        val clusterCenters = clusters.clusterCenters map (_.toArray)
        println("The Cluster Centers are = " + clusterCenters)
        // Evaluate clustering by computing Within Set Sum of Squared Errors
        val WSSSE = clusters.computeCost(parsedData)
        println("Within Set Sum of Squared Errors = " + WSSSE)

解决方案

as I know you should run predict for each elements.

    KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

    List<Vector> vectors = parsedData.collect();
    for(Vector vector: vectors){
        System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
    }

这篇关于打印群ID,并使用星火KMEANS算法中的元素。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆