在Spark的客户端模式下,驱动程序需要对远程执行程序的网络访问吗? [英] In Spark's client mode, the driver needs network access to remote executors?

查看:82
本文介绍了在Spark的客户端模式下,驱动程序需要对远程执行程序的网络访问吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在客户端模式下使用spark (例如yarn-client)时,运行驱动程序的本地计算机是否与运行远程执行程序的集群工作程序节点直接通信?

When using spark at client mode (e.g. yarn-client), does the local machine that runs the driver communicates directly with the cluster worker nodes that run the remote executors?

如果是,这是否意味着(运行驱动程序的)计算机需要对工作节点具有网络访问权限?那么主节点从群集请求资源,并将工作节点的IP地址/端口返回给驱动程序,以便驱动程序可以启动与工作节点的通信?

If yes, does it mean the machine (that runs the driver) need to have network access to the worker nodes? So the master node requests resources from the cluster, and returns the IP addresses/ports of the worker nodes to the driver, so the driver can initiating the communication with the worker nodes?

如果没有,客户端模式实际上如何工作?

If not, how does the client mode actually work?

如果是,这是否意味着如果以某种方式配置集群,使工作节点在集群外部不可见,并且必须使用集群模式,那么客户端模式将无法工作?

If yes, does it mean that the client mode won't work if the cluster is configured in a way that the work nodes are not visible outside the cluster, and one will have to use cluster mode?

谢谢!

推荐答案

驱动程序连接到Spark Master,请求上下文,然后Spark Master向Spark Workers发送驱动程序的详细信息,以进行通信并获得有关什么的说明.要做.

The Driver connects to the Spark Master, requests a context, and then the Spark Master passes the Spark Workers the details of the Driver to communicate and get instructions on what to do.

这意味着驱动程序节点必须在网络上对工作人员可用,并且IP必须是对他们可见的IP(即,如果驱动程序位于NAT之后,而工作人员位于其他网络中,则不会)不能正常工作,您会看到工人无法连接驱动程序的错误)

The means that the driver node must be available on the network to the workers, and it's IP must be one that's visible to them (i.e. if the driver is behind NAT, while the workers are in a different network, it won't work and you'll see errors on the workers that they fail to connect to the driver)

这篇关于在Spark的客户端模式下,驱动程序需要对远程执行程序的网络访问吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆