如何为SparkSession使用自定义配置文件(不使用spark-submit提交应用程序)? [英] How to use custom config file for SparkSession (without using spark-submit to submit application)?
问题描述
我有一个独立的python脚本,通过调用以下代码行来创建 SparkSession
,我可以看到它完美地配置了spark会话,如 spark-defaults.conf中所述
文件.
I have an independent python script that creates a SparkSession
by invoking the following lines of code and I can see that it configures the spark session perfectly as mentioned in the spark-defaults.conf
file.
spark = SparkSession.builder.appName("Tester").enableHiveSupport().getOrCreate()
如果要作为参数传递另一个文件,该文件包含要使用的火花配置而不是 spark-default.conf
,那么在创建 SparkSession
?
If I want to pass as a parameter, another file that contains spark configuration that I want to be used instead of the spark-default.conf
, how can I specify this while creating a SparkSession
?
我可以传递 SparkConf
对象,但是有没有办法从包含所有配置的文件中自动创建一个对象?
I can see that I can pass a SparkConf
object but is there a way to create one automatically from a file containing all the configurations?
我必须手动解析输入文件并手动设置适当的配置吗?
Do I have to manually parse the input file and set the appropriate configuration manually?
推荐答案
如果您不使用 spark-submit
,则最好的做法是覆盖 SPARK_CONF_DIR
.为每个配置集创建单独的目录:
If you don't use spark-submit
your best here is overriding SPARK_CONF_DIR
. Create separate directory for each configurations set:
$ configs tree
.
├── conf1
│ ├── docker.properties
│ ├── fairscheduler.xml
│ ├── log4j.properties
│ ├── metrics.properties
│ ├── spark-defaults.conf
│ ├── spark-defaults.conf.template
│ └── spark-env.sh
└── conf2
├── docker.properties
├── fairscheduler.xml
├── log4j.properties
├── metrics.properties
├── spark-defaults.conf
├── spark-defaults.conf.template
└── spark-env.sh
并在初始化任何依赖于JVM的对象之前设置环境变量:
And set environment variable before you initialize any JVM dependent objects:
import os
from pyspark.sql import SparkSession
os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf1"
spark = SparkSession.builder.getOrCreate()
或
import os
from pyspark.sql import SparkSession
os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf2"
spark = SparkSession.builder.getOrCreate()
这是解决方法,在复杂的情况下可能不起作用.
This is workaround and might not work in complex scenarios.
这篇关于如何为SparkSession使用自定义配置文件(不使用spark-submit提交应用程序)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!