Spark MLlib 示例,NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame() [英] Spark MLlib example, NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame()

查看:26
本文介绍了Spark MLlib 示例,NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注文档示例 示例:估计器、转换器和参数

我收到了错误消息

15/09/23 11:46:51 INFO BlockManagerMaster:注册的 BlockManager线程main"中的异常 java.lang.NoSuchMethodError:scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;在 SimpleApp$.main(hw.scala:75)

15/09/23 11:46:51 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror; at SimpleApp$.main(hw.scala:75)

第 75 行是代码sqlContext.createDataFrame()":

And line 75 is the code "sqlContext.createDataFrame()":

import java.util.Random

import org.apache.log4j.Logger
import org.apache.log4j.Level

import scala.io.Source

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd._


import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.mllib.recommendation.{ALS, Rating, MatrixFactorizationModel}
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._

object SimpleApp {
     def main(args: Array[String]) {
       val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]");
       val sc = new SparkContext(conf)
       val sqlContext = new SQLContext(sc)
       val training = sqlContext.createDataFrame(Seq(
         (1.0, Vectors.dense(0.0, 1.1, 0.1)),
         (0.0, Vectors.dense(2.0, 1.0, -1.0)),
         (0.0, Vectors.dense(2.0, 1.3, 1.0)),
         (1.0, Vectors.dense(0.0, 1.2, -0.5))
       )).toDF("label", "features")
    }
}

我的 sbt 如下所示:

And my sbt is like below:

lazy val root = (project in file(".")).
  settings(
    name := "hello",
    version := "1.0",
    scalaVersion := "2.11.4"
  )

libraryDependencies ++= {
    Seq(
        "org.apache.spark" %% "spark-core" % "1.4.1" % "provided",
        "org.apache.spark" %% "spark-sql" % "1.4.1" % "provided",
        "org.apache.spark" % "spark-hive_2.11" % "1.4.1",
        "org.apache.spark"  % "spark-mllib_2.11" % "1.4.1" % "provided",
        "org.apache.spark" %% "spark-streaming" % "1.4.1" % "provided",
        "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.1" % "provided"
    )
}

我试着四处搜索,发现这篇文章与我的问题非常相似,我尝试更改 spark 版本的 sbt 设置(spark-mllib_2.11 到 2.10,spark-1.4.1 到 1.5.0),但它带来了更多的依赖冲突.

I tried to search around and found this post which is very similar to my issue, and I tried to change my sbt setting for spark versions (spark-mllib_2.11 to 2.10, and spark-1.4.1 to 1.5.0), but it came even more dependency conflicts.

我的直觉是版本问题,但我自己无法解决,有人可以帮忙吗?非常感谢.

My intuition is it's some version problem but cannot figure it out myself, could anyone please help? thanks a lot.

推荐答案

它现在对我有用,只是为了记录,参考 @MartinSenne 的答案.

It's working now for me, and just for the record, referencing @MartinSenne answer.

我所做的如下:

  1. 清除文件夹project"下的所有编译文件
  2. scala 版本 2.10.4(之前使用 2.11.4)
  3. 将 spark-sql 更改为:"org.apache.spark" %% "spark-sql" % "1.4.1" % "provided"
  4. 将 MLlib 更改为:"org.apache.spark" %% "spark-mllib" % "1.4.1" % "provided"
  1. clear all compile files under folder "project"
  2. scala version 2.10.4 (previously using 2.11.4)
  3. change spark-sql to be: "org.apache.spark" %% "spark-sql" % "1.4.1" % "provided"
  4. change MLlib to be: "org.apache.spark" %% "spark-mllib" % "1.4.1" % "provided"

@note:

  1. 我已经启动了一个 Spark 集群,我使用sh spark-submit/path_to_folder/hello/target/scala-2.10/hello_2.10-1.0.jar"将 jar 提交给 Spark掌握.如果使用 sbt 通过命令sbt run"运行将失败.
  2. 当从 scala-2.11 更改为 scala-2.10 时,请记住 jar 文件路径和名称 也会从scala-2.11/hello_2.11-1.0.jar"更改强>"到scala-2.10/hello_2.10-1.0.jar".当我重新打包所有东西时,我忘记更改jar名称的提交作业命令,所以我打包到hello_2.10-1.0.jar"但提交hello_2.10-1.0.jar"这给我带来了额外的问题...
  3. 我尝试了val sqlContext = new org.apache.spark.sql.SQLContext(sc)"和val sqlContext = new org.apache.spark.sql.hive".HiveContext(sc)",两者都使用方法 createDataFrame()
  1. I've already started a Spark cluster and I use "sh spark-submit /path_to_folder/hello/target/scala-2.10/hello_2.10-1.0.jar" to submit jar to Spark master. If use sbt to run by command "sbt run" will fail.
  2. when changing from scala-2.11 to scala-2.10, remember that the jar file path and name will also change from "scala-2.11/hello_2.11-1.0.jar" to "scala-2.10/hello_2.10-1.0.jar". when I re-packaged everything, I forgot to change the submit job command for the jar name, so I package into "hello_2.10-1.0.jar" but submitting "hello_2.10-1.0.jar" which caused me extra problem...
  3. I tried both "val sqlContext = new org.apache.spark.sql.SQLContext(sc)" and "val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)", both are working with method createDataFrame()

这篇关于Spark MLlib 示例,NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆