Spark 因 org.apache.kafka.common.serialization.StringDeserializer 的 NoClassDefFoundError 失败 [英] Spark fails with NoClassDefFoundError for org.apache.kafka.common.serialization.StringDeserializer

查看:32
本文介绍了Spark 因 org.apache.kafka.common.serialization.StringDeserializer 的 NoClassDefFoundError 失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个通用 Spark 应用程序,该应用程序使用 Spark 和 Java 侦听 Kafka 流.

I am developing a generic Spark application that listens to a Kafka stream using Spark and Java.

我正在使用 kafka_2.11-0.10.2.2、spark-2.3.2-bin-hadoop2.7 - 在发布此问题之前,我还尝试了其他几种 kafka/spark 组合.

I am using kafka_2.11-0.10.2.2, spark-2.3.2-bin-hadoop2.7 - I also tried several other kafka/spark combinations before posting this question.

代码在加载 StringDeserializer 类时失败:

The code fails at loading StringDeserializer class:

 SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount");
    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

    Set<String> topicsSet = new HashSet<>();
    topicsSet.add(topics);
    Map<String, Object> kafkaParams = new HashMap<>();
    kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
    kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
    kafkaParams.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    kafkaParams.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);

我得到的错误是:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer

来自 为什么Spark 应用程序因线程main"中的异常而失败java.lang.NoClassDefFoundError: ...StringDeserializer"? 看来这可能是 Scala 版本不匹配的问题,但我的 pom.xml 没有这个问题:

From Why does Spark application fail with "Exception in thread "main" java.lang.NoClassDefFoundError: ...StringDeserializer"? it seems that this could be a scala version mismatch issue, but my pom.xml doesn't have that issue:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>yyy.iot.ckc</groupId>
<artifactId>sparkpoc</artifactId>
<version>1.0-SNAPSHOT</version>

<name>sparkpoc</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <java.version>1.8</java.version>

    <spark.scala.version>2.11</spark.scala.version>
    <spark.version>2.3.2</spark.version>
</properties>

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${spark.scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_${spark.scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_${spark.scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

</dependencies>

<build>
    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
        <plugins>
            <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
            <plugin>
                <artifactId>maven-clean-plugin</artifactId>
                <version>3.1.0</version>
            </plugin>
            <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
            <plugin>
                <artifactId>maven-resources-plugin</artifactId>
                <version>3.0.2</version>
            </plugin>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
            </plugin>
            <plugin>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.22.1</version>
            </plugin>
            <plugin>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.0.2</version>
            </plugin>
            <plugin>
                <artifactId>maven-install-plugin</artifactId>
                <version>2.5.2</version>
            </plugin>
            <plugin>
                <artifactId>maven-deploy-plugin</artifactId>
                <version>2.8.2</version>
            </plugin>
            <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
            <plugin>
                <artifactId>maven-site-plugin</artifactId>
                <version>3.7.1</version>
            </plugin>
            <plugin>
                <artifactId>maven-project-info-reports-plugin</artifactId>
                <version>3.0.0</version>
            </plugin>
        </plugins>
    </pluginManagement>
</build>
</project>

我使用的提交脚本是:

./bin/spark-submit \
    --class "yyy.iot.ckc.KafkaDataModeler" \
    --master local[2] \
    ../sparkpoc/target/sparkpoc-1.0-SNAPSHOT.jar

有人能指出我哪里出错的正确方向吗?

Can anyone please point me in the right direction as to where I am going wrong?

推荐答案

Spark 通过运行 JVM 实例来运行程序.因此,如果库 (JAR) 不在该 JVM 的类路径中,我们就会遇到此运行时异常.解决方案是将所有依赖的 JAR 与主 JAR 一起打包.以下构建脚本适用于此.

Spark runs the program as by running an instance of a JVM. So if the libraries (JARs) are not in the classpath of that JVM we run into this runtime exception. The solution is to package all the dependent JARs along with main JAR. The following build script will work for that.

另外,正如https://stackoverflow.com/a/54583941/1224075中提到的火花的范围-core 和 spark-streaming 库需要声明为所提供的.这是因为某些库是由 Spark JVM 隐式提供的.

Also, as mentioned in https://stackoverflow.com/a/54583941/1224075 the scope of the spark-core and spark-streaming libraries need to be declared as provided. This is because some of the libraries are implicitly provided by the Spark JVM.

对我有用的 POM 的构建部分 -

The build section of the POM which worked for me -

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.2.1</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

这篇关于Spark 因 org.apache.kafka.common.serialization.StringDeserializer 的 NoClassDefFoundError 失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆