Pyspark 附加执行器环境变量 [英] Pyspark append executor environment variable

查看:42
本文介绍了Pyspark 附加执行器环境变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在 spark 中为 worker 的 PYTHONPATH 附加值?

Is it possible to append a value to the PYTHONPATH of a worker in spark?

我知道可以转到每个工作节点,配置 spark-env.sh 文件并执行此操作,但我想要更灵活的方法

I know it is possible to go to each worker node, configure spark-env.sh file and do it, but I want a more flexible approach

我正在尝试使用 setExecutorEnv 方法,但没有成功

I am trying to use setExecutorEnv method, but with no success

conf = SparkConf().setMaster("spark://192.168.10.11:7077")
              .setAppName(''myname')
              .set("spark.cassandra.connection.host", "192.168.10.11") /
              .setExecutorEnv('PYTHONPATH', '$PYTHONPATH:/custom_dir_that_I_want_to_append/')

它在每个执行器上创建一个 pythonpath env.variable,强制它为小写,并且不解释 $PYTHONPATH 命令来附加值.

It creates a pythonpath env.variable on each executor, force it to be lower_case, and does not interprets $PYTHONPATH command to append the value.

我最终得到了两个不同的 env.variables,

I end up with two different env.variables,

pythonpath  :  $PYTHONPATH:/custom_dir_that_I_want_to_append
PYTHONPATH  :  /old/path/to_python

第一个是动态创建的,第二个之前已经存在.

The first one is dynamically created and the second one already existed before.

有人知道怎么做吗?

推荐答案

我发现了自己...

问题不在于 spark,而在于 ConfigParser

The problem is not with spark, but in ConfigParser

基于这个答案,我修复了ConfigParser 始终保留大小写.

Based on this answer, I fixed the ConfigParser to always preserve case.

在此之后,我发现默认的 spark 行为是将值附加到现有的 worker env.variables,如果有同名的 env.variable.

After this, I found out that the default spark behavior is to append the values to existing worker env.variables, if there is a env.variable with the same name.

因此,没有必要在美元符号中提及 $PYTHONPATH.

So, it is not necessary to mention $PYTHONPATH within dollar sign.

.setExecutorEnv('PYTHONPATH', '/custom_dir_that_I_want_to_append/')

这篇关于Pyspark 附加执行器环境变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆