Spark Streaming:Spark和Kafka通讯是如何发生的? [英] Spark Streaming: How Spark and Kafka communication happens?

查看:53
本文介绍了Spark Streaming:Spark和Kafka通讯是如何发生的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解Kafka和Spark(Streaming)节点之间的通信是如何发生的.我有以下问题.

I would like to understand how the communication between the Kafka and Spark(Streaming) nodes takes place. I have the following questions.

  1. 如果Kafka服务器和Spark节点位于两个单独的群集中,将如何进行通信.配置它们需要执行哪些步骤.
  2. 如果两者都在同一个集群中,但是在不同的节点中,那么通信将如何进行.

通信我的意思是这里是RPC还是Socket通信.我想了解内部解剖学

communication i mean here is whether it is a RPC or Socket communication. I would like to understand the internal anatomy

任何帮助表示赞赏.

预先感谢.

推荐答案

首先,不计算Kafka节点和Spark节点是否在同一群集中,但是它们应该能够连接到每个群集其他(防火墙中的开放端口).

First of all, it doesn't count if the Kafka nodes and Spark nodes are in the same cluster or not, but they should be able to connect to each other (open ports in firewall).

使用较旧的 KafkaUtils.createStream() API和较新的 KafkaUtils.createDirectStream()方法,有2种方法可以通过Spark Streaming从Kafka进行读取.

There are 2 ways to read from Kafka with Spark Streaming, using the older KafkaUtils.createStream() API, and the newer, KafkaUtils.createDirectStream() method.

我不想弄清楚它们之间的区别,有据可查

I don't want to get into the differences between them, that is well documented here (in short, direct stream is better).

解决您的问题,交流是如何发生的(内部解剖结构):找出答案的最佳方法是查看Spark源代码.

Addressing your question, how does the communication happen (internal anatomy): the best way to find out is looking at the Spark source code.

createStream() API直接使用来自官方 org.apache.kafka 包的一组Kafka使用者.这些Kafka使用者拥有自己的客户端,称为 NetworkClient ,您可以检查

The createStream() API uses a set of Kafka consumers, directly from the official org.apache.kafka packages. These Kafka consumers have their own client called the NetworkClient, which you can check here. In short, the NetworkClient uses sockets for communicating.

createDirectStream() API确实使用了来自同一 org.apache.kafka 包中的Kafka SimpleConsumer . SimpleConsumer 类使用 java.nio.ReadableByteChannel (它是 java.nio.SocketChannel 的子类)从Kafka读取,因此最后也可以通过套接字来完成,但是使用Java的非阻塞I/O便利性API可以间接得多.

The createDirectStream() API does use the Kafka SimpleConsumer from the same org.apache.kafka package. The SimpleConsumer class reads from Kafka with a java.nio.ReadableByteChannel which is a subclass of java.nio.SocketChannel, so in the end it is with done with sockets as well, but a bit more indirectly using Java's Non-blocking I/O convenience APIs.

因此回答您的问题:它是通过套接字完成的.

So to answer your question: it is done with sockets.

这篇关于Spark Streaming:Spark和Kafka通讯是如何发生的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆