打印群ID,并使用星火KMEANS算法中的元素。 [英] Printing ClusterID and its elements using Spark KMeans algo.
本文介绍了打印群ID,并使用星火KMEANS算法中的元素。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个计划,打印在Apache的火花的Kmeans的MSSE算法。有20所产生的集群。我要打印的群ID,并且得到了分配给各群ID的元素。我该如何遍历所有的群ID打印的元素。
感谢你们!!
VAL SC =新SparkContext(本地,KMeansExample,在/ usr /本地/火花/目录(目标/斯卡拉-2.10 / kmeans_2.10-1.0 。罐))
//加载和分析数据
VAL数据= sc.textFile(kmeans.csv)
VAL parsedData = Data.Map中(S => Vectors.dense(s.split(,)的地图(_ toDouble))。) //集群数据转换成使用KMEANS两类
VAL numIterations = 20
VAL numClusters = 20
VAL集群= KMeans.train(parsedData,numClusters,numIterations)
VAL clusterCenters = clusters.clusterCenters地图(_.toArray)
的println(以下简称聚类中心=+ clusterCenters)
//通过计算中设置的平方和错误的评估集群
VAL WSSSE = clusters.computeCost(parsedData)
的println(在误差平方的总和设置=+ WSSSE)
解决方案
我知道你应该运行predict每个元素。
KMeansModel集群= KMeans.train(parsedData.rdd(),numClusters,numIterations); 清单<向量>矢量= parsedData.collect();
对于(矢量向量:向量){
的System.out.println(集群+集群predict(矢量)++与Vector.toString());
}
I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements.
Thank you guys!!
val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
// Load and parse the data
val data = sc.textFile("kmeans.csv")
val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))
// Cluster the data into two classes using KMeans
val numIterations = 20
val numClusters = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)
val clusterCenters = clusters.clusterCenters map (_.toArray)
println("The Cluster Centers are = " + clusterCenters)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)
解决方案
as I know you should run predict for each elements.
KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
List<Vector> vectors = parsedData.collect();
for(Vector vector: vectors){
System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
}
这篇关于打印群ID,并使用星火KMEANS算法中的元素。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文