集群中的Apache Flink流不与工人拆分工作 [英] Apache Flink streaming in cluster does not split jobs with workers

查看：85 发布时间：2020/9/3 7:06:21 streaming cluster-computing apache-kafka apache-flink

本文介绍了集群中的Apache Flink流不与工人拆分工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是使用Kafka作为来源并建立一个高吞吐量集群. Flink作为流处理引擎.这就是我所做的.

My objective is to setup a high throughput cluster using Kafka as source & Flink as the stream processing engine. Here's what I have done.

我已经在主节点和从节点上按照以下配置设置了一个2节点群集.

I have setup a 2-node cluster the following configuration on the master and the slave.

主flink-conf.yaml

jobmanager.rpc.address: <MASTER_IP_ADDR> #localhost

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 256

taskmanager.heap.mb: 512

taskmanager.numberOfTaskSlots: 50

parallelism.default: 100

从站flink-conf.yaml

jobmanager.rpc.address: <MASTER_IP_ADDR> #localhost

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 512 #256

taskmanager.heap.mb: 1024 #512

taskmanager.numberOfTaskSlots: 50

parallelism.default: 100

主节点上的从属文件如下:

The slaves file on the Master node looks like this:

<SLAVE_IP_ADDR>
localhost

两个节点上的flink设置位于一个具有相同名称的文件夹中.我通过运行在主服务器上启动集群

The flink setup on both nodes is in a folder which has the same name. I start up the cluster on the master by running

bin/start-cluster-streaming.sh

这将启动从属节点上的任务管理器.

This starts up the task manager on the slave node.

我的输入源是Kafka.这是代码段.

My input source is Kafka. Here is the snippet.

final StreamExecutionEnvironment env = 
    StreamExecutionEnvironment.getExecutionEnvironment();

DataStreamSource<String> stream = 
    env.addSource(
    new KafkaSource<String>(kafkaUrl,kafkaTopic, new SimpleStringSchema()));
stream.addSink(stringSinkFunction);

env.execute("Kafka stream");

这是我的接收器功能

public class MySink implements SinkFunction<String> {

    private static final long serialVersionUID = 1L;

    public void invoke(String arg0) throws Exception {
        processMessage(arg0);
        System.out.println("Processed Message");
    }
}

这是我pom.xml中的Flink依赖项.

Here are the Flink Dependencies in my pom.xml.

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-core</artifactId>
    <version>0.9.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-clients</artifactId>
    <version>0.9.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka</artifactId>
    <version>0.9.0</version>
</dependency>

然后我在主服务器上使用此命令运行打包的jar

Then I run the packaged jar with this command on the master

bin/flink run flink-test-jar-with-dependencies.jar

但是，当我将消息插入Kafka主题时，仅在主节点上，我就能够解决所有来自我的Kafka主题的消息(通过SinkFunction实现的invoke方法中的调试消息).

However when I insert messages into the Kafka topic I am able to account for all messages coming in from my Kafka topic (via debug messages in the invoke method of my SinkFunction implementation) on the Master node alone.

在作业管理器用户界面中，我可以看到2个任务管理器，如下所示:

In the Job manager UI I am able to see 2 Task managers as below:

此外，仪表板也如下所示: 问题:

Also The dashboard looks like so : Questions:

为什么从属节点无法执行任务?
我缺少一些配置吗?

集群中的Apache Flink流不与工人拆分工作 [英] Apache Flink streaming in cluster does not split jobs with workers

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

集群中的Apache Flink流不与工人拆分工作 [英] Apache Flink streaming in cluster does not split jobs with workers

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭