Spark MLlib示例,NoSuchMethodError:org.apache.spark.sql.SQLContext.createDataFrame() [英] Spark MLlib example, NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame()
问题描述
我正在遵循文档示例我收到错误消息
15/09/23 11:46:51 INFO BlockManagerMaster:已注册的BlockManager 线程主"中的异常java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse $ JavaMirror; 在SimpleApp $ .main(hw.scala:75)
15/09/23 11:46:51 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror; at SimpleApp$.main(hw.scala:75)
第75行是代码"sqlContext.createDataFrame()":
And line 75 is the code "sqlContext.createDataFrame()":
import java.util.Random
import org.apache.log4j.Logger
import org.apache.log4j.Level
import scala.io.Source
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd._
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.mllib.recommendation.{ALS, Rating, MatrixFactorizationModel}
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]");
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val training = sqlContext.createDataFrame(Seq(
(1.0, Vectors.dense(0.0, 1.1, 0.1)),
(0.0, Vectors.dense(2.0, 1.0, -1.0)),
(0.0, Vectors.dense(2.0, 1.3, 1.0)),
(1.0, Vectors.dense(0.0, 1.2, -0.5))
)).toDF("label", "features")
}
}
我的sbt如下:
lazy val root = (project in file(".")).
settings(
name := "hello",
version := "1.0",
scalaVersion := "2.11.4"
)
libraryDependencies ++= {
Seq(
"org.apache.spark" %% "spark-core" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-sql" % "1.4.1" % "provided",
"org.apache.spark" % "spark-hive_2.11" % "1.4.1",
"org.apache.spark" % "spark-mllib_2.11" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.1" % "provided"
)
}
我试图四处搜寻,发现了这篇文章与我的问题非常相似,我尝试将Spark版本的sbt设置(spark-mllib_2.11更改为2.10,将spark-1.4.1更改为1.5.0),但出现了更多依赖冲突.
I tried to search around and found this post which is very similar to my issue, and I tried to change my sbt setting for spark versions (spark-mllib_2.11 to 2.10, and spark-1.4.1 to 1.5.0), but it came even more dependency conflicts.
我的直觉是这是一个版本问题,但我自己无法弄清楚,有人可以帮忙吗?非常感谢.
My intuition is it's some version problem but cannot figure it out myself, could anyone please help? thanks a lot.
推荐答案
对于我来说,现在正在工作,仅作记录,请引用@MartinSenne答案.
It's working now for me, and just for the record, referencing @MartinSenne answer.
我所做的如下:
- 清除文件夹项目"下的所有编译文件
- scala版本 2.10.4 (以前使用2.11.4)
- 将spark-sql更改为:" org.apache.spark" %%"spark-sql"%"1.4.1"%提供"
- 将MLlib更改为:"org.apache.spark" %%"spark-mllib"%"1.4.1"%提供"
- clear all compile files under folder "project"
- scala version 2.10.4 (previously using 2.11.4)
- change spark-sql to be: "org.apache.spark" %% "spark-sql" % "1.4.1" % "provided"
- change MLlib to be: "org.apache.spark" %% "spark-mllib" % "1.4.1" % "provided"
@note:
- 我已经启动了一个Spark集群,并使用" sh spark-submit/path_to_folder/hello/target/scala-2.10/hello_2.10-1.0.jar "将jar提交给Spark掌握.如果使用sbt通过命令" sbt run "运行将失败.
- 当从scala-2.11更改为scala-2.10时,请记住jar文件路径和名称也将从" scala-2.11/hello_2.11-1.0.jar "改为" scala-2.10/hello_2.10-1.0.jar ".当我重新打包所有内容时,我忘记更改jar名称的Submit job命令,因此我将其打包为"hello_2.10-1.0.jar",但提交了"hello_2.10-1.0.jar",这给我带来了额外的问题. ..
- 我尝试了"val sqlContext = 新的org.apache.spark.sql.SQLContext(sc)"和"val sqlContext = 新的org.apache.spark.sql.hive". HiveContext(sc)",两者均与方法 createDataFrame() 一起使用
- I've already started a Spark cluster and I use "sh spark-submit /path_to_folder/hello/target/scala-2.10/hello_2.10-1.0.jar" to submit jar to Spark master. If use sbt to run by command "sbt run" will fail.
- when changing from scala-2.11 to scala-2.10, remember that the jar file path and name will also change from "scala-2.11/hello_2.11-1.0.jar" to "scala-2.10/hello_2.10-1.0.jar". when I re-packaged everything, I forgot to change the submit job command for the jar name, so I package into "hello_2.10-1.0.jar" but submitting "hello_2.10-1.0.jar" which caused me extra problem...
- I tried both "val sqlContext = new org.apache.spark.sql.SQLContext(sc)" and "val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)", both are working with method createDataFrame()
这篇关于Spark MLlib示例,NoSuchMethodError:org.apache.spark.sql.SQLContext.createDataFrame()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!