Pyspark追加执行程序环境变量 [英] Pyspark append executor environment variable

查看:433
本文介绍了Pyspark追加执行程序环境变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在spark中将值附加到工人的PYTHONPATH?

Is it possible to append a value to the PYTHONPATH of a worker in spark?

我知道可以转到每个工作节点,配置spark-env.sh文件并执行此操作,但是我想要一种更灵活的方法

I know it is possible to go to each worker node, configure spark-env.sh file and do it, but I want a more flexible approach

我正在尝试使用setExecutorEnv方法,但没有成功

I am trying to use setExecutorEnv method, but with no success

conf = SparkConf().setMaster("spark://192.168.10.11:7077")\
              .setAppName(''myname')\
              .set("spark.cassandra.connection.host", "192.168.10.11") /
              .setExecutorEnv('PYTHONPATH', '$PYTHONPATH:/custom_dir_that_I_want_to_append/')

它将在每个执行程序上创建一个pythonpath env.variable,将其强制为小写,并且不会解释$ PYTHONPATH命令来附加该值.

It creates a pythonpath env.variable on each executor, force it to be lower_case, and does not interprets $PYTHONPATH command to append the value.

最后我得到了两个不同的环境变量

I end up with two different env.variables,

pythonpath  :  $PYTHONPATH:/custom_dir_that_I_want_to_append
PYTHONPATH  :  /old/path/to_python

第一个是动态创建的,第二个以前已经存在.

The first one is dynamically created and the second one already existed before.

有人知道怎么做吗?

推荐答案

我想通了自己...

问题不是出在火花上,而是在ConfigParser中

The problem is not with spark, but in ConfigParser

基于此答案,我已修复ConfigParser始终保留大小写.

Based on this answer, I fixed the ConfigParser to always preserve case.

此后,我发现默认的火花行为是,如果存在具有相同名称的env.variable,则将值附加到现有的worker env.variables.

After this, I found out that the default spark behavior is to append the values to existing worker env.variables, if there is a env.variable with the same name.

因此,没有必要在美元符号中提及$ PYTHONPATH.

So, it is not necessary to mention $PYTHONPATH within dollar sign.

.setExecutorEnv('PYTHONPATH', '/custom_dir_that_I_want_to_append/')

这篇关于Pyspark追加执行程序环境变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆