打印群ID，并使用星火KMEANS算法中的元素。 [英] Printing ClusterID and its elements using Spark KMeans algo.

查看：231 发布时间：2016/5/22 16:06:06 apache-spark k-means apache-spark-mllib

本文介绍了打印群ID，并使用星火KMEANS算法中的元素。的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这个计划，打印在Apache的火花的Kmeans的MSSE算法。有20所产生的集群。我要打印的群ID，并且得到了分配给各群ID的元素。我该如何遍历所有的群ID打印的元素。

感谢你们!!

  VAL SC =新SparkContext（本地，KMeansExample，在/ usr /本地/火花/目录（目标/斯卡拉-2.10 / kmeans_2.10-1.0 。罐））
            //加载和分析数据
            VAL数据= sc.textFile（kmeans.csv）
         VAL parsedData = Data.Map中（S =＆GT; Vectors.dense（s.split（，）的地图（_ toDouble））。）        //集群数据转换成使用KMEANS两类
        VAL numIterations = 20
        VAL numClusters = 20
        VAL集群= KMeans.train（parsedData，numClusters，numIterations）
        VAL clusterCenters = clusters.clusterCenters地图（_.toArray）
        的println（以下简称聚类中心=+ clusterCenters）
        //通过计算中设置的平方和错误的评估集群
        VAL WSSSE = clusters.computeCost（parsedData）
        的println（在误差平方的总和设置=+ WSSSE）

解决方案

我知道你应该运行predict每个元素。

  KMeansModel集群= KMeans.train（parsedData.rdd（），numClusters，numIterations）;    清单＆LT;向量＆GT;矢量= parsedData.collect（）;
    对于（矢量向量：向量）{
        的System.out.println（集群+集群predict（矢量）++与Vector.toString（））;
    }

I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements.

Thank you guys!!

           val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
            // Load and parse the data
            val data = sc.textFile("kmeans.csv")
         val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))

        // Cluster the data into two classes using KMeans
        val numIterations = 20
        val numClusters = 20
        val clusters = KMeans.train(parsedData, numClusters, numIterations)
        val clusterCenters = clusters.clusterCenters map (_.toArray)
        println("The Cluster Centers are = " + clusterCenters)
        // Evaluate clustering by computing Within Set Sum of Squared Errors
        val WSSSE = clusters.computeCost(parsedData)
        println("Within Set Sum of Squared Errors = " + WSSSE)

解决方案

as I know you should run predict for each elements.

    KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

    List<Vector> vectors = parsedData.collect();
    for(Vector vector: vectors){
        System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
    }

这篇关于打印群ID，并使用星火KMEANS算法中的元素。的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

打印群ID，并使用星火KMEANS算法中的元素。 [英] Printing ClusterID and its elements using Spark KMeans algo.

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

打印群ID，并使用星火KMEANS算法中的元素。 [英] Printing ClusterID and its elements using Spark KMeans algo.

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭