java.lang.NoSuchMethodError: scala.Predef$.refArrayOps 在 Spark 作业中使用 Scala [英] java.lang.NoSuchMethodError: scala.Predef$.refArrayOps in Spark job with Scala

查看:107
本文介绍了java.lang.NoSuchMethodError: scala.Predef$.refArrayOps 在 Spark 作业中使用 Scala的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

完全错误:

<块引用>

线程main"中的异常java.lang.NoSuchMethodError:scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;在 org.spark_module.SparkModule$.main(SparkModule.scala:62)在 org.spark_module.SparkModule.main(SparkModule.scala)在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)在 java.lang.reflect.Method.invoke(Method.java:498)在 org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)在 org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)在 org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)在 org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)在 org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)在 org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)在 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)在 org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

当我在 IntelliJ 中编译和运行代码时,它一直执行得很好.当我将 .jar 作为 spark 作业(运行时)提交时,错误显示.

第 62 行包含:for ((elem, i) <- args.zipWithIndex).为了确定,我注释掉了其余的代码,错误一直显示在该行上.

起初我以为是zipWithIndex 的错.然后我将它更改为 for (elem <- args) 并猜测是什么,错误仍然显示.for 是否导致了这种情况?

Google 搜索总是指出用于编译的版本与运行时使用的版本之间的 Scala 版本不兼容,但我找不到解决方案.

我尝试了

然后我做了这个来检查Scala的运行时版本,输出是:

<块引用>

(文件:/C:/Users/me/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.12.11/1a0634714a956c1aae9abefc83acaf6d4eabfa7d/scala-1.12.罐)

版本似乎匹配...

这是我的 gradle.build(包括 fatJar 任务)

组'org.spark_module'版本1.0-快照"应用插件:'scala'应用插件:想法"应用插件:'eclipse'存储库{MavenCentral()}主意 {项目 {jdkName = '1.8'语言级别 = '1.8'}}依赖{实现组:'org.scala-lang',名称:'scala-library',版本:'2.12.11'实现组:'org.apache.spark',名称:'spark-core_2.12'//,版本:'2.4.5'实现组:'org.apache.spark',名称:'spark-sql_2.12'//,版本:'2.4.5'实现组:'com.datastax.spark',名称:'spark-cassandra-connector_2.12',版本:'2.5.0'实现组:'org.apache.spark',名称:'spark-mllib_2.12',版本:'2.4.5'实现组:'log4j',名称:'log4j',版本:'1.2.17'实现组:'org.scalaj',名称:'scalaj-http_2.12',版本:'2.4.2'}任务 fatJar(类型:Jar){zip64 真从 {configuration.runtimeClasspath.collect { it.isDirectory() ?它:zipTree(它)}} {排除META-INF/*.SF"排除META-INF/*.DSA"排除META-INF/*.RSA"}显现 {属性主类":org.spark_module.SparkModule"}带罐子}配置.所有{解析策略{强制'com.google.guava:guava:12.0.1'}}compileScala.targetCompatibility = "1.8"compileScala.sourceCompatibility = "1.8"罐子{zip64 真getArchiveFileName()从 {配置.编译.收集{it.isDirectory() ?它:zipTree(它)}}显现 {属性主类":org.spark_module.SparkModule"}排除 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'}

构建(胖)jar:

gradlew fatJar

在 IntelliJ 的终端中.

运行作业:

spark-submit.cmd .\SparkModule-1.0-SNAPSHOT.jar

在 Windows PowerShell 中.

谢谢

spark-submit.cmdspark-shell.cmd 都显示 Scala 版本 2.11.12,所以是的,它们不同于我在 IntelliJ (2.12.11) 中使用的一个.问题是,在 Spark 的下载页面中,只有一个 Spark 分布Scala 2.12,它没有 Hadoop;这是否意味着我必须在 gradle.build 中从 2.12 降级到 2.11?

解决方案

我会尝试 spark-submit --version 以了解 scala version 使用的是什么 火花

使用 spark-submit --version 我得到这个信息

[cloudera@quickstart scala-programming-for-data-science]$ spark-submit --version欢迎来到____ __/__/__ ___ _____//___\ \/_ \/_ `/__/'_//___/.__/\_,_/_//_/\_\ 版本 2.2.0.cloudera4/_/使用 Scala 版本 2.11.8,Java HotSpot(TM) 64 位服务器 VM,1.8.0_202分行负责人由用户 jenkins 在 2018-09-27T02:42:51Z 编译修订版 0ef0912caaab3f2636b98371eb29adb42978c595网址 git://github.mtv.cloudera.com/CDH/spark.git键入 --help 以获取更多信息.

spark-shell 你可以试试这个来了解 scala 版本

scala>util.Properties.versionStringres3:字符串 = 版本 2.11.8

OS 可能正在使用其他 scala 版本,就我而言,您可以看到 spark scala 版本OS scala 版本 不一样

[cloudera@quickstart scala-programming-for-data-science]$ scala -versionScala 代码运行器版本 2.12.8 -- 版权所有 2002-2018,LAMP/EPFL 和 Lightbend, Inc.

注意来自 O'Really Learning Spark Holden Karau、Andy Konwinski、Patrick Wendell &马泰扎哈里亚

依赖冲突

一个偶尔的破坏性问题是在以下情况下处理依赖冲突用户应用程序和 Spark 本身都依赖于同一个库.这出现相对较少,但一旦发生,用户就会感到烦恼.通常,这将体现当 NoSuchMethodErrorClassNotFoundException 或其他一些JVM 异常 与类加载相关的Spark 作业在执行过程中被抛出.这个问题有两种解决方案.首先是将您的应用程序修改为依赖与 Spark 相同版本的 第三方库.第二个是使用通常称为的过程修改应用程序的打包阴影."Maven 构建工具通过高级配置支持着色例 7-5 中显示的插件(实际上,着色 功能是插件的原因被命名为 maven-shade-plugin).阴影允许您制作第二个副本不同命名空间下的冲突包并重写应用程序的代码使用重命名的版本.这种有点brute-force 的技术在解决运行时依赖冲突.有关如何阴影的具体说明依赖项,请参阅您的构建工具的文档.

Full error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object; at org.spark_module.SparkModule$.main(SparkModule.scala:62) at org.spark_module.SparkModule.main(SparkModule.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

When I compile and run the code in IntelliJ, it executes fine all the way through. The error shows when I submit the .jar as a spark job (runtime).

Line 62 contains: for ((elem, i) <- args.zipWithIndex). I commented out the rest of the code to be sure, and the error kept showing on that line.

At first I thought it was zipWithIndex's fault. Then I changed it for for (elem <- args) and guess what, the error still showed. Is the for causing this?

Google searching always points to Scala versions incompatibility between version used to compile and version used on runtime but I can't figure out a solution.

I tried this to check Scala version used by IntelliJ and here is everything Scala-related under Modules > Scala:

Then I did this to check the run-time version of Scala and the output is:

(file:/C:/Users/me/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.12.11/1a0634714a956c1aae9abefc83acaf6d4eabfa7d/scala-library-2.12.11.jar )

Versions seem to match...

This is my gradle.build (includes fatJar task)

group 'org.spark_module'
version '1.0-SNAPSHOT'

apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
    mavenCentral()
}

idea {
    project {
        jdkName = '1.8'
        languageLevel = '1.8'
    }
}

dependencies {
    implementation group: 'org.scala-lang', name: 'scala-library', version: '2.12.11'
    implementation group: 'org.apache.spark', name: 'spark-core_2.12'//, version: '2.4.5'
    implementation group: 'org.apache.spark', name: 'spark-sql_2.12'//, version: '2.4.5'
    implementation group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.12', version: '2.5.0'
    implementation group: 'org.apache.spark', name: 'spark-mllib_2.12', version: '2.4.5'
    implementation group: 'log4j', name: 'log4j', version: '1.2.17'
    implementation group: 'org.scalaj', name: 'scalaj-http_2.12', version: '2.4.2'
}

task fatJar(type: Jar) {
    zip64 true
    from {
        configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
    } {
        exclude "META-INF/*.SF"
        exclude "META-INF/*.DSA"
        exclude "META-INF/*.RSA"
    }

    manifest {
        attributes 'Main-Class': 'org.spark_module.SparkModule'
    }

    with jar
}

configurations.all {
    resolutionStrategy {
        force 'com.google.guava:guava:12.0.1'
    }
}

compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"

jar {
    zip64 true
    getArchiveFileName()
    from {
        configurations.compile.collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    manifest {
        attributes 'Main-Class': 'org.spark_module.SparkModule'
    }

    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'

}

To build the (fat) jar:

gradlew fatJar

in IntelliJ's terminal.

To run the job:

spark-submit.cmd .\SparkModule-1.0-SNAPSHOT.jar

in Windows PowerShell.

Thank you

EDIT:

spark-submit.cmd and spark-shell.cmd both show Scala version 2.11.12, so yes, they differ from the one I am using in IntelliJ (2.12.11). The problem is, in Spark's download page, there is only one Spark distribution for Scala 2.12 and it comes without Hadoop; does it mean I have to downgrade from 2.12 to 2.11 in my gradle.build?

解决方案

I would try spark-submit --version to know what scala version is using spark

With spark-submit --version I get this information

[cloudera@quickstart scala-programming-for-data-science]$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0.cloudera4
      /_/
                        
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_202
Branch HEAD
Compiled by user jenkins on 2018-09-27T02:42:51Z
Revision 0ef0912caaab3f2636b98371eb29adb42978c595
Url git://github.mtv.cloudera.com/CDH/spark.git
Type --help for more information.

from the spark-shell you could try this to know the scala version

scala> util.Properties.versionString
res3: String = version 2.11.8

The OS could be using other scala version, in my case as you can see spark scala version and OS scala version are different

[cloudera@quickstart scala-programming-for-data-science]$ scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

Note From O'Really Learning Spark "Holden Karau, Andy Konwinski,Patrick Wendell & Matei Zaharia"

Dependency Conflicts

One occasionally disruptive issue is dealing with dependency conflicts in cases where a user application and Spark itself both depend on the same library. This comes up relatively rarely, but when it does, it can be vexing for users. Typically, this will manifest itself when a NoSuchMethodError, a ClassNotFoundException, or some other JVM exception related to class loading is thrown during the execution of a Spark job. There are two solutions to this problem. The first is to modify your application to depend on the same version of the third-party library that Spark does. The second is to modify the packaging of your application using a procedure that is often called "shading." The Maven build tool supports shading through advanced configuration of the plug-in shown in Example 7-5 (in fact, the shading capability is why the plugin is named maven-shade-plugin). Shading allows you to make a second copy of the conflicting package under a different namespace and rewrites your application’s code to use the renamed version. This somewhat brute-force technique is quite effective at resolving runtime dependency conflicts. For specific instructions on how to shade dependencies, see the documentation for your build tool.

这篇关于java.lang.NoSuchMethodError: scala.Predef$.refArrayOps 在 Spark 作业中使用 Scala的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆