无法使用spark读取kafka主题数据 [英] unable to read kafka topic data using spark

查看:30
本文介绍了无法使用spark读取kafka主题数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我创建的名为 "sampleTopic"

sid,Believer  

其中第一个参数是用户名,第二个参数是用户经常收听的歌曲名称.现在,我已经使用上述主题名称启动了 zookeeperKafka 服务器producer.我已经使用 CMD 为该主题输入了上述数据.现在,我想在 spark 中读取主题执行一些聚合,并将其写回流.下面是我的代码:

Where the first argument is the username and the second argument is the song name which the user frequently listens. Now, I have started zookeeper, Kafka server, and producer with the topic name as mentioned above. I have entered the above data for that topic using CMD. Now, I want to read the topic in spark perform some aggregation, and write it back to stream. Below is my code:

package com.sparkKafka
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object SparkKafkaTopic {
  def main(args: Array[String]) {
    val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
    println("hey")
    val df = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("subscribe", "sampleTopic1")
      .load()
    val query = df.writeStream
      .outputMode("append")
      .format("console")
      .start().awaitTermination()


  }
}

但是,当我执行上面的代码时,它给出了:

However, when I execute the above code it gives :

    +----+--------------------+------------+---------+------+--------------------+-------------+
| key|               value|       topic|partition|offset|           timestamp|timestampType|
+----+--------------------+------------+---------+------+--------------------+-------------+
|null|[73 69 64 64 68 6...|sampleTopic1|        0|     4|2020-05-31 12:12:...|            0|
+----+--------------------+------------+---------+------+--------------------+-------------+

下面也有无限循环消息

20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.
20/05/31 11:56:12 INFO Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-0d6807b9-fcc9-4847-abeb-f0b81ab25187--264582860-driver-0] Resetting offset for partition sampleTopic1-0 to offset 4.

我需要输出如下内容:

I need output something like below:

根据 Srinivas 的建议,我得到了以下输出:

As modified on the suggestion by Srinivas I got the following output:

不确定这里到底出了什么问题.请指导我完成它.

Not sure what exactly is wrong over here. Please guide me through it.

推荐答案

尝试将 spark-sql-kafka 库添加到您的构建文件中.检查下面.

Try to add spark-sql-kafka library to your build file. Check below.

build.sbt

libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.0"  
// Change to Your spark version 

pom.xml

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
    <version>2.3.0</version>    // Change to Your spark version
</dependency>

更改您的代码,如下所示

Change your code like below

    package com.sparkKafka
    import org.apache.spark.SparkContext
    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.functions._
    case class KafkaMessage(key: String, value: String, topic: String, partition: Int, offset: Long, timestamp: String)

    object SparkKafkaTopic {

      def main(args: Array[String]) {
        //val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
        println("hey")
        val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
        import spark.implicits._
        val mySchema = StructType(Array(
          StructField("userName", StringType),
          StructField("songName", StringType)))
        val df = spark
          .readStream
          .format("kafka")
          .option("kafka.bootstrap.servers", "localhost:9092")
          .option("subscribe", "sampleTopic1")
          .load()

        val query = df
          .as[KafkaMessage]
          .select(split($"value", ",")(0).as("userName"),split($"value", ",")(1).as("songName"))
          .writeStream
          .outputMode("append")
          .format("console")
          .start()
          .awaitTermination()
      }
    }

     /*
        +------+--------+
        |userid|songname|
        +------+--------+
        |   sid|Believer|
        +------+--------+
       */

      }
    }

这篇关于无法使用spark读取kafka主题数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆