从 kafka-Spark-Streaming 读取数据时获取空集 [英] Getting Empty set while reading data from kafka-Spark-Streaming

查看:23
本文介绍了从 kafka-Spark-Streaming 读取数据时获取空集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Spark Streaming 的新手.我正在尝试读取 xml 文件并将其发送到 kafka 主题.这是我的 Kafka 代码,它向 Kafka-console-consumer 发送数据.

Hi i am new to Spark Streaming. i am trying to read the xml file and send it to kafka topic. Here is my Kafka Code Which sends data to Kafka-console-consumer.

代码:

package org.apache.kafka.Kafka_Producer;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Properties;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutionException;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

@SuppressWarnings("unused")
public class KafkaProducer { 
   private static String sCurrentLine;
   public static void main(String args[]) throws InterruptedException, ExecutionException{ 
       try (BufferedReader br = new BufferedReader(new FileReader("/Users/sreeharsha/Downloads/123.txt")))
       {
           while ((sCurrentLine = br.readLine()) != null) {
               System.out.println(sCurrentLine);
               kafka(sCurrentLine);
           }
       } catch (FileNotFoundException e) {
           // TODO Auto-generated catch block
           e.printStackTrace();
       } catch (IOException e) {
           // TODO Auto-generated catch block
           e.printStackTrace();}
   }
   public static void kafka(String sCurrentLine)  {
       Properties props = new Properties();
       props.put("metadata.broker.list", "localhost:9092");
       props.put("serializer.class", "kafka.serializer.StringEncoder");
       props.put("partitioner.class","kafka.producer.DefaultPartitioner");
       props.put("request.required.acks", "1");
       ProducerConfig config = new ProducerConfig(props);
       Producer<String, String> producer = new Producer<String, String>(config);
       producer.send(new KeyedMessage<String, String>("sample",sCurrentLine));
       producer.close();
   }
}

我可以在 Kafka-Console-Consumer 中接收数据.在下面的屏幕截图中,您可以看到我发送到该主题的数据.

i can recieve the data in Kafka-Console-Consumer. In the below screenshot you can see the data which i had sent to the topic.

现在我需要使用 Spark-Streaming 流式传输我发送到 kafka-console-consumer 的数据.这是代码.

Now i need to stream the data which i send to kafka-console-consumer using Spark-Streaming. Here is the code.

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaStreamingContext;

public class SparkStringConsumer {

   public static void main(String[] args) {

       SparkConf conf = new SparkConf()
               .setAppName("kafka-sandbox")
               .setMaster("local[*]");
       JavaSparkContext sc = new JavaSparkContext(conf);
       JavaStreamingContext ssc = new JavaStreamingContext(sc, new Duration(2000));

       Map<String, String> kafkaParams = new HashMap<>();
       kafkaParams.put("metadata.broker.list", "localhost:9092");
       Set<String> topics = Collections.singleton("sample");

       JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils.createDirectStream(ssc,
       String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics);
       directKafkaStream.foreachRDD(rdd -> {
       System.out.println("--- New RDD with " + rdd.partitions().size()
           + " partitions and " + rdd.count() + " records");
       rdd.foreach(record -> System.out.println(record._2));
       });
       ssc.start();
       ssc.awaitTermination();
   }
}

像这样提交我的工作时得到空集:

Getting emptyset while submitting my job like this:

./spark-submit --class org.apache.spark_streaming.Spark_Kafka_Streaming.SparkStringConsumer --master local[4] Spark_Kafka_Streaming-0.0.1-SNAPSHOT.jar

您可以在下面看到数据接收方式的屏幕截图:

Below you can see the screenshot of how the data is receiving:

使用以下版本:

火花 - 2.0.0

动物园管理员 -3.4.6

Zookeeper -3.4.6

卡夫卡 - 0.8.2.1

Kafka - 0.8.2.1

请提出任何建议

推荐答案

终于在网上冲浪后我找到了这些解决方案.

Finally after surfing over internet i found these solution.

不要同时使用Spark-Submit"和SetMaster".

Don't use "Spark-Submit" and "SetMaster" at the same time.

  • 如果您从 IDE 运行代码,请在代码中使用 SetMaster
  • 如果您通过Spark-Submit"运行 jar,请不要将 setMaster 放入您的代码中

还有一件事,首先运行/提交你的 spark jar,然后将数据发送到 Kafka-Console-Consumer

And one more thing first run/submit your spark jar and then send the data to Kafka-Console-Consumer

工作正常.

这篇关于从 kafka-Spark-Streaming 读取数据时获取空集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆