找到Scala Spark类型不匹配的单位，必需为rdd.RDD [英] Scala Spark type missmatch found Unit, required rdd.RDD

查看：277 发布时间：2020/8/11 8:25:01 mysql scala apache-spark type-mismatch training-data

本文介绍了找到Scala Spark类型不匹配的单位，必需为rdd.RDD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用scala编写的spark项目中从MySQL数据库读取一个表.这是我的第一个礼拜，所以我真的不太适应.当我尝试跑步时

I am reading a table from a MySQL database in a spark project written in scala. It s my first week on it so I am really not so fit. When I am trying to run

  val clusters = KMeans.train(parsedData, numClusters, numIterations)

我收到parsedData的错误消息:类型不匹配；找到:org.apache.spark.rdd.RDD [Map [String，Any]]必需:org.apache.spark.rdd.RDD [org. apache.spark.mllib.linalg.Vector]"

I am getting an error for parsedData that says:"type mismatch; found : org.apache.spark.rdd.RDD[Map[String,Any]] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]"

我的解析数据是像上面这样创建的:

My parsed data is created above like this:

 val parsedData = dataframe_mysql.map(_.getValuesMap[Any](List("name", "event","execution","info"))).collect().foreach(println)

其中dataframe_mysql是sqlcontext.read.format("jdbc").option(....) function.

where dataframe_mysql is the whatever is returned from sqlcontext.read.format("jdbc").option(....) function.

我应该如何转换我的单位以满足火车功能中传递它的要求?

How am I supposed to convert my unit to fit the requirements to pass it in the train function?

根据文档，我应该使用类似这样的东西:

According to documentation I am supposed to use something like this:

data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()

我应该将自己的价值观转变为两倍吗?因为当我尝试运行上面的命令时，我的项目将崩溃.

Am I supposed to transform my values to double? because when I try to run the command above my project will crash.

谢谢！

找到Scala Spark类型不匹配的单位，必需为rdd.RDD [英] Scala Spark type missmatch found Unit, required rdd.RDD

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

找到Scala Spark类型不匹配的单位，必需为rdd.RDD [英] Scala Spark type missmatch found Unit, required rdd.RDD

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭