flink中job、task、subtask的区别 [英] Difference between job, task and subtask in flink
问题描述
我是 flink 的新手并试图理解:
- 工作
- 任务
- 子任务
我在
任务是一种抽象,表示可以在单个线程中执行的一系列操作符.诸如 keyBy(这会导致网络 shuffle 以某个键对流进行分区)或管道并行性的变化之类的东西将破坏链接并强制操作符进入单独的任务.在上图中,应用程序包含三个任务.
子任务是任务的一个并行切片.这是可调度、可运行的执行单元.在上图中,应用程序的 source/map 和 keyBy/Window/apply 任务的并行度为 2,接收器的并行度为 1 - 总共有 5 个子任务.
作业是应用程序的运行实例.客户端将作业提交给作业管理器,作业管理器将它们分成子任务并安排这些子任务由任务管理器执行.
更新:
社区决定重新调整任务和子任务的定义,以匹配这些术语在代码中的使用方式——这意味着任务和子任务现在意味着同一件事:恰好是一个并行实例运营商或运营商链.有关更多详细信息,请参阅词汇表.
I'm new to flink and try to understand:
- job
- task
- subtask
I searched in the docs but still did not get it. What's the main diffence between them?
Tasks and sub-tasks are explained here -- https://ci.apache.org/projects/flink/flink-docs-release-1.7/concepts/runtime.html#tasks-and-operator-chains:
A task is an abstraction representing a chain of operators that could be executed in a single thread. Something like a keyBy (which causes a network shuffle to partition the stream by some key) or a change in the parallelism of the pipeline will break the chaining and force operators into separate tasks. In the diagram above, the application has three tasks.
A subtask is one parallel slice of a task. This is the schedulable, runable unit of execution. In the diagram above, the application is to be run with a parallelism of two for the source/map and keyBy/Window/apply tasks, and a parallelism of one for the sink -- resulting in a total of 5 subtasks.
A job is a running instance of an application. Clients submit jobs to the jobmanager, which slices them into subtasks and schedules those subtasks for execution by the taskmanagers.
Update:
The community decided to re-align the definitions of task and sub-task to match how these terms are used in the code -- which means that task and sub-task now mean the same thing: exactly one parallel instance of an operator or operator chain. See the Glossary for more details.
这篇关于flink中job、task、subtask的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!