使用 Spark 从 Kafka 主题中的特定分区流式传输数据 [英] Stream data using Spark from a partiticular partition within Kafka topics

查看：36 发布时间：2021/11/12 3:21:15 apache-spark apache-kafka apache-spark-sql spark-streaming kafka-consumer-api

本文介绍了使用 Spark 从 Kafka 主题中的特定分区流式传输数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经看到一个类似的问题 clickhere

I have already seen a similar question as clickhere

但我仍然想知道是否不可能从特定分区流式传输数据?我在 Spark Streaming 订阅方法中使用了 Kafka Consumer Strategies.

But still I want to know if streaming data from a particular partition not possible? I have used Kafka Consumer Strategies in Spark Streaming subscribe method.

ConsumerStrategies.Subscribe[String, String](topics, kafkaParams,偏移量)

ConsumerStrategies.Subscribe[String, String](topics, kafkaParams, offsets)

这是我尝试订阅主题和分区的代码片段，

This is the code snippet I tried out for subscribing to topic and partition,

val topics = Array("cdc-classic")
val topic="cdc-classic"
val partition=2;
val offsets= 
Map(new TopicPartition(topic, partition) -> 2L)//I am not clear with this line, (I tried to set topic and partition number as 2)
val stream = KafkaUtils.createDirectStream[String, String](
      ssc,
      PreferConsistent,
      Subscribe[String, String](topics, kafkaParams,offsets))

但是当我运行此代码时，出现以下异常，

But whenI run this code I get the following exception,

     Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {cdc-classic-2=2}
    at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:878)
    at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:525)
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1110)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
    at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:99)
    at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:70)
Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {cdc-classic-2=2}
    at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:878)
    at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:525)
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1110)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
    at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:99)

P.S:cdc-classic 是 17 个分区的主题名称

使用 Spark 从 Kafka 主题中的特定分区流式传输数据 [英] Stream data using Spark from a partiticular partition within Kafka topics

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Spark 从 Kafka 主题中的特定分区流式传输数据 [英] Stream data using Spark from a partiticular partition within Kafka topics

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭