在PySpark中运行KMeans集群 [英] Running KMeans clustering in PySpark

查看：309 发布时间：2020/4/26 10:26:03 pyspark k-means apache-spark-mllib

本文介绍了在PySpark中运行KMeans集群的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我第一次尝试在Spark中运行KMeans聚类分析，因此，我很抱歉遇到一个愚蠢的问题.

it's my very first time trying to run KMeans cluster analysis in Spark, so, I am sorry for a stupid question.

我有一个包含许多列的spark数据框mydataframe.我只想在两列上运行kmeans:lat和long(纬度和经度)，将它们用作简单值.我只想基于这2列提取7个集群.我尝试过:

I have a spark dataframe mydataframe with many columns. I want to run kmeans on only two columns: lat and long (latitude & longitude) using them as simple values. I want to extract 7 clusters based on just those 2 columns. I've tried:

from numpy import array
from math import sqrt
from pyspark.mllib.clustering import KMeans, KMeansModel

# Prepare a data frame with just 2 columns:
data = mydataframe.select('lat', 'long')

# Build the model (cluster the data)
clusters = KMeans.train(data, 7, maxIterations=15, initializationMode="random")

但是我遇到一个错误:

"DataFrame"对象没有属性"map"

'DataFrame' object has no attribute 'map'

一个要馈给KMeans.train的对象应该是什么? 显然，它不接受DataFrame. 我应该如何准备数据框架进行分析?

What should be the object one feeds to KMeans.train? Clearly, it doesn't accept a DataFrame. How should I prepare my data frame for the analysis?

非常感谢！

在PySpark中运行KMeans集群 [英] Running KMeans clustering in PySpark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在PySpark中运行KMeans集群 [英] Running KMeans clustering in PySpark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭