如何获取PySpark中的工人(执行者)人数? [英] How to get the number of workers(executors) in PySpark?
问题描述
我需要使用此参数,那么如何获得工人人数?
像在Scala中一样,我可以调用sc.getExecutorMemoryStatus
以获得可用的工人数.但是在PySpark中,似乎没有暴露任何API来获取此号码.
I need to use this parameter, so how can I get the number of workers?
Like in Scala, I can call sc.getExecutorMemoryStatus
to get the available number of workers. But in PySpark, it seems there's no API exposed to get this number.
推荐答案
在scala中,getExecutorStorageStatus
和getExecutorMemoryStatus
都返回执行程序的数量,包括驱动程序.
像下面的示例代码片段
In scala, getExecutorStorageStatus
and getExecutorMemoryStatus
both return the number of executors including driver.
like below example snippet
/** Method that just returns the current active/registered executors
* excluding the driver.
* @param sc The spark context to retrieve registered executors.
* @return a list of executors each in the form of host:port.
*/
def currentActiveExecutors(sc: SparkContext): Seq[String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
val driverHost: String = sc.getConf.get("spark.driver.host")
allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
}
@DanielDarabos answer 也证实了这一点.
@DanielDarabos answer also confirms this.
相当于python ...
The equivalent to this in python...
sc.getConf().get("spark.executor.instances")
编辑(python):
可能是sc._conf.get('spark.executor.instances')
这篇关于如何获取PySpark中的工人(执行者)人数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!