为什么Apache Flink应用程序的并行执行要比顺序执行慢? [英] Why is the parallel execution of an Apache Flink application slower than the sequential execution?

查看:257
本文介绍了为什么Apache Flink应用程序的并行执行要比顺序执行慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有一个TaskManager和两个处理插槽的Apache Flink设置.当我执行将并行度设置为1的应用程序时,该作业大约需要33秒才能执行.当我将并行度增加到2时,作业需要45秒才能完成.

I have an Apache Flink setup with one TaskManager and two processing slots. When I execute an application with parallelism set as 1, the job takes around 33 seconds to execute. When I increase the parallelism to 2, the job takes 45 seconds to complete.

我在Windows计算机上使用Flink并配置了10个Compute Cores(4C + 6G).我想通过2个插槽获得更好的结果.我该怎么办?

I am using Flink on my Windows machine with the configuration of 10 Compute Cores(4C + 6G). I want to achieve better results with 2 slots. What can I do?

推荐答案

像Apache Flink这样的分布式系统旨在在数百台计算机上的数据中心中运行.它们并非旨在在单台计算机上并行化计算.此外,Flink针对大规模问题.在本地计算机上运行几秒钟的作业不是Flink的主要用例.

Distributed systems like Apache Flink are designed to run in data centers on hundreds of machines. They are not designed to parallelize computations on a single computer. Moreover, Flink targets large-scale problems. Jobs that run in seconds on a local machine are not the primary use case for Flink.

并行应用程序总是会导致开销.数据必须在进程和线程之间分配和共享. Flink通过序列化和反序列化将数据分布在TaskManager插槽中.此外,启动和协调分布式任务也不是免费的.

Parallelizing an application always causes overhead. Data has to be distributed and shared between processes and threads. Flink distributes data across TaskManager slots by serializing and deserializing it. Moreover, starting and coordinating distributed tasks also does not come for free.

当在一台机器上使用分布式系统扩展小规模问题时,观察到更长的执行时间并不奇怪.您可以将应用程序移植到利用共享内存的线程并行应用程序.

It is not surprising to observe longer execution times when scaling a small-scale problem with a distributed system on a single machine. You could port the application to a thread-parallel application that leverages shared memory.

这篇关于为什么Apache Flink应用程序的并行执行要比顺序执行慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆