Scala和Spark对已编译jar的兼容性问题 [英] Compatibility issue with Scala and Spark for compiled jars

查看:659
本文介绍了Scala和Spark对已编译jar的兼容性问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Scala和Spark很新。我在Scala中出现了一些版本问题错误,因此我尝试在 pom.xml 中更改Scala版本,以便在群集上运行我的jar文件。最后,我发现制作jar的成功Scala版本是 2.11

I am quite new to Scala and Spark. I had some error with version issue in Scala so I tried to change the Scala version in pom.xml in order to run my jar file on the cluster. Finally, I found that the successful Scala version that made the jar run was 2.11.

但是,我是一个对Scala版本有点好奇,因为当我在bash shell上命令 scala -version 时,我的群集上安装的Scala版本是2.10.4(它不是 2.11 )。更奇怪的是,当我将scala版本 2.11 更改为 2.10 时,jar文件无法正常工作code> pom.xml 制作相同的scala版本。并且,它抛出错误如下。

However, I am a little bit curious about the Scala version because my Scala version installed on my cluster was 2.10.4 when I commanded scala -version on the bash shell (It is not 2.11). What's even stranger thing is that the jar file didn't work when I changed the scala version 2.11 to 2.10 in pom.xml to make the same scala version. And, it throw the error as below.

Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
        at com.bistel.scala.App$.main(App.scala:17)
        at com.bistel.scala.App.main(App.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

当我将scala版本更改为 2.11 时,它运行良好再次,并没有抛出任何错误。
我想了解与Spark和Scala的兼容性,但上述不匹配的问题让我感到困惑。

When I changed the scala version to 2.11, it worked well again and didn't throw any errors. I want to understand the compatibility with the Spark and Scala but the above mismatched issues make me confused.

任何帮助都将不胜感激。

Any help will be appreciated.

我附上了两个版本的 pom。 xml (第一个是 2.11 的scala版本,另一个是 2.10 )。

I attched the two versions of pom.xml (first one is scala version of 2.11 and the other is 2.10).

下面是效果很好的。如图所示,它的scala版本是2.11。

Below is the one that work well. Its scala version is 2.11 as indicated.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.bistel.scala</groupId>
  <artifactId>scala-001</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2015</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>

  <properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.8</scala.version>
    <scala.compat.version>2.11</scala.compat.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

      <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-reflect</artifactId>
          <version>${scala.version}</version>
      </dependency>


      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.0.1</version>
    </dependency>

      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
      <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.11</artifactId>
          <version>2.0.1</version>
      </dependency>


    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-junit_${scala.compat.version}</artifactId>
      <version>2.4.16</version>
      <scope>test</scope>
    </dependency>


    <!-- Test -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-core_${scala.compat.version}</artifactId>
      <version>2.4.16</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.compat.version}</artifactId>
      <version>2.2.4</version>
      <scope>test</scope>
    </dependency>


  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <!-- see http://davidb.github.com/scala-maven-plugin -->
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.0</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
            <configuration>
              <args>
                <arg>-dependencyfile</arg>
                <arg>${project.build.directory}/.scala_dependencies</arg>
              </args>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.18.1</version>
        <configuration>
          <useFile>false</useFile>
          <disableXmlReport>true</disableXmlReport>
          <!-- If you have classpath issue like NoDefClassError,... -->
          <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
          <includes>
            <include>**/*Test.*</include>
            <include>**/*Suite.*</include>
          </includes>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

以下是不起作用的。如图所示,它的scala版本为2.10。

Below is the one that didn't work. Its scala version is 2.10 as indicated.

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.bistel.scala</groupId>
  <artifactId>scala-001</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2015</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>

  <properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.10.4</scala.version>
    <scala.compat.version>2.10</scala.compat.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>2.0.1</version>
    </dependency>

      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
      <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.10</artifactId>
          <version>2.0.1</version>
      </dependency>


    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-junit_${scala.compat.version}</artifactId>
      <version>2.4.16</version>
      <scope>test</scope>
    </dependency>


    <!-- Test -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-core_${scala.compat.version}</artifactId>
      <version>2.4.16</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.compat.version}</artifactId>
      <version>2.2.4</version>
      <scope>test</scope>
    </dependency>


  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <!-- see http://davidb.github.com/scala-maven-plugin -->
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.0</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
            <configuration>
              <args>
                <arg>-dependencyfile</arg>
                <arg>${project.build.directory}/.scala_dependencies</arg>
              </args>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.18.1</version>
        <configuration>
          <useFile>false</useFile>
          <disableXmlReport>true</disableXmlReport>
          <!-- If you have classpath issue like NoDefClassError,... -->
          <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
          <includes>
            <include>**/*Test.*</include>
            <include>**/*Suite.*</include>
          </includes>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>


推荐答案

有几个元素需要兼容:
1.驱动程序上的spark版本(如果你没有使用纱线分发罐子的执行者)
2.你的pom中的spark版本
3. scala的版本在你的来源

There are several elements which need to be compatible: 1. The version of spark on your driver (and the executors if you are not using yarn to distribute the jars) 2. The version of spark in your pom 3. The version of scala in your source

1)你的驱动程序上的火花版本(+执行者):基本上你安装了一些版本的火花。这个spark版本是用一些scala版本编译的(spark 2.X.X的默认版本是scala 2.11)。
群集上安装的scala版本无关紧要,只是火花罐中包含的内容。
应该在群集的所有节点上安装相同版本的spark(在运行新应用程序时,yarn允许分发这些jar,因此在这种情况下,您可以同时运行多个版本的spark)。

1) Version of spark on your driver (+ executors): Basically you have some version of spark installed. This spark version is compiled with some scala version (the default for spark 2.X.X is scala 2.11). It doesn't matter what scala version is installed on the cluster, just what is included in the spark jars. The same version of spark should be installed on all nodes of the cluster (yarn allows to distribute these jars when you run a new application so in that case you can have multiple versions of spark running together).

2)你的pom中火花的版本:当你创建你的pom时,你会包含一些依赖项,包括spark。它们的末尾附加了_2.10或_2.11,表示这些依赖项匹配的scala版本。火花版本可能与安装的火花版本具有向后兼容性(至少在主要版本中)不同,所以你可以在你的pom中使用spark版本2.0.1,即使你的集群有2.1.0(尽管反过来不是保证)。 scala版本必须与已安装的spark的版本相同。

2) Version of spark in your pom: When you create your pom, you include some dependencies including spark. These have _2.10 or _2.11 appended to their end which represents the version of scala these dependencies match. The spark version can be different that the installed one as spark has backward compatibility (at least in major version) so you can use spark version 2.0.1 in your pom even though your cluster has 2.1.0 (although the other way around is not guaranteed). The scala version must be the same as the version of the installed spark.

3)最后,您有用于编译的scala版本。这应该与spark的scala版本相同。

3) Lastly, you have the scala version used for compilation. This again should be the same as the scala version of spark.

您还应该设置spark依赖关系的范围,以避免与已安装的版本冲突。

You should also probably set the scope of your spark dependencies as provided to avoid conflicts with the installed version.

这篇关于Scala和Spark对已编译jar的兼容性问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆