找到Scala Spark类型不匹配的单位,必需为rdd.RDD [英] Scala Spark type missmatch found Unit, required rdd.RDD
问题描述
我正在用scala编写的spark项目中从MySQL数据库读取一个表.这是我的第一个礼拜,所以我真的不太适应.当我尝试跑步时
I am reading a table from a MySQL database in a spark project written in scala. It s my first week on it so I am really not so fit. When I am trying to run
val clusters = KMeans.train(parsedData, numClusters, numIterations)
我收到parsedData的错误消息:类型不匹配;找到:org.apache.spark.rdd.RDD [Map [String,Any]]必需:org.apache.spark.rdd.RDD [org. apache.spark.mllib.linalg.Vector]"
I am getting an error for parsedData that says:"type mismatch; found : org.apache.spark.rdd.RDD[Map[String,Any]] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]"
我的解析数据是像上面这样创建的:
My parsed data is created above like this:
val parsedData = dataframe_mysql.map(_.getValuesMap[Any](List("name", "event","execution","info"))).collect().foreach(println)
其中dataframe_mysql是sqlcontext.read.format("jdbc").option(....) function.
where dataframe_mysql is the whatever is returned from sqlcontext.read.format("jdbc").option(....) function.
我应该如何转换我的单位以满足火车功能中传递它的要求?
How am I supposed to convert my unit to fit the requirements to pass it in the train function?
根据文档,我应该使用类似这样的东西:
According to documentation I am supposed to use something like this:
data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
我应该将自己的价值观转变为两倍吗?因为当我尝试运行上面的命令时,我的项目将崩溃.
Am I supposed to transform my values to double? because when I try to run the command above my project will crash.
谢谢!
推荐答案
删除尾随的.collect().foreach(println)
.调用collect
之后,您将不再拥有RDD-它只是变成一个本地集合.
Remove the trailing .collect().foreach(println)
. After calling collect
, you no longer have an RDD - it just turns into a local collection.
随后,当您调用foreach
时,它将返回Unit
-foreach用于产生副作用,例如打印集合中的每个元素.等
Subsequently, when you call foreach
it returns Unit
- foreach is for doing side-effects like printing each element in a collection. etc.
这篇关于找到Scala Spark类型不匹配的单位,必需为rdd.RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!