如何为SparkSession使用自定义配置文件(不使用spark-submit提交应用程序)? [英] How to use custom config file for SparkSession (without using spark-submit to submit application)?

查看:65
本文介绍了如何为SparkSession使用自定义配置文件(不使用spark-submit提交应用程序)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个独立的python脚本,通过调用以下代码行来创建 SparkSession ,我可以看到它完美地配置了spark会话,如 spark-defaults.conf中所述文件.

I have an independent python script that creates a SparkSession by invoking the following lines of code and I can see that it configures the spark session perfectly as mentioned in the spark-defaults.conf file.

spark = SparkSession.builder.appName("Tester").enableHiveSupport().getOrCreate()

如果要作为参数传递另一个文件,该文件包含要使用的火花配置而不是 spark-default.conf ,那么在创建 SparkSession ?

If I want to pass as a parameter, another file that contains spark configuration that I want to be used instead of the spark-default.conf, how can I specify this while creating a SparkSession?

我可以传递 SparkConf 对象,但是有没有办法从包含所有配置的文件中自动创建一个对象?

I can see that I can pass a SparkConf object but is there a way to create one automatically from a file containing all the configurations?

我必须手动解析输入文件并手动设置适当的配置吗?

Do I have to manually parse the input file and set the appropriate configuration manually?

推荐答案

如果您不使用 spark-submit ,则最好的做法是覆盖 SPARK_CONF_DIR .为每个配置集创建单独的目录:

If you don't use spark-submit your best here is overriding SPARK_CONF_DIR. Create separate directory for each configurations set:

$ configs tree           
.
├── conf1
│   ├── docker.properties
│   ├── fairscheduler.xml
│   ├── log4j.properties
│   ├── metrics.properties
│   ├── spark-defaults.conf
│   ├── spark-defaults.conf.template
│   └── spark-env.sh
└── conf2
    ├── docker.properties
    ├── fairscheduler.xml
    ├── log4j.properties
    ├── metrics.properties
    ├── spark-defaults.conf
    ├── spark-defaults.conf.template
    └── spark-env.sh

并在初始化任何依赖于JVM的对象之前设置环境变量:

And set environment variable before you initialize any JVM dependent objects:

import os
from pyspark.sql import SparkSession

os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf1"
spark  = SparkSession.builder.getOrCreate()

import os
from pyspark.sql import SparkSession

os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf2"
spark  = SparkSession.builder.getOrCreate()

这是解决方法,在复杂的情况下可能不起作用.

This is workaround and might not work in complex scenarios.

这篇关于如何为SparkSession使用自定义配置文件(不使用spark-submit提交应用程序)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆