如何获取PySpark中的工人(执行者)人数? [英] How to get the number of workers(executors) in PySpark?

查看:88
本文介绍了如何获取PySpark中的工人(执行者)人数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用此参数,那么如何获得工人人数? 像在Scala中一样,我可以调用sc.getExecutorMemoryStatus以获得可用的工人数.但是在PySpark中,似乎没有暴露任何API来获取此号码.

I need to use this parameter, so how can I get the number of workers? Like in Scala, I can call sc.getExecutorMemoryStatus to get the available number of workers. But in PySpark, it seems there's no API exposed to get this number.

推荐答案

在scala中,getExecutorStorageStatusgetExecutorMemoryStatus都返回执行程序的数量,包括驱动程序. 像下面的示例代码片段

In scala, getExecutorStorageStatus and getExecutorMemoryStatus both return the number of executors including driver. like below example snippet

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

但是在python api中未实现

@DanielDarabos answer 也证实了这一点.

@DanielDarabos answer also confirms this.

相当于python ...

The equivalent to this in python...

sc.getConf().get("spark.executor.instances")

编辑(python):

可能是sc._conf.get('spark.executor.instances')

这篇关于如何获取PySpark中的工人(执行者)人数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆