Azure Databricks:如何在Databricks群集中添加Spark配置 [英] Azure Databricks: How to add Spark configuration in Databricks cluster

查看:246
本文介绍了Azure Databricks:如何在Databricks群集中添加Spark配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Spark Databricks集群,并希望添加自定义的Spark配置.
关于此有一个Databricks文档,但是我不知道应该如何以及应该进行哪些更改.有人可以分享示例来配置Databricks集群吗?
有什么方法可以查看Databricks群集中Spark的默认配置.

I am using a Spark Databricks cluster and want to add a customized Spark configuration.
There is a Databricks documentation on this but I am not getting any clue how and what changes I should make. Can someone pls share the example to configure the Databricks cluster.
Is there any way to see the default configuration for Spark in the Databricks cluster.

推荐答案

要微调Spark作业,您可以提供自定义群集配置中的火花配置属性.

To fine tune Spark jobs, you can provide custom Spark configuration properties in a cluster configuration.

  1. 在群集配置页面上,单击高级选项"切换.
  2. 单击火花"选项卡.

[OR]

使用Clusters API配置集群时,请在创建集群请求"或编辑集群请求"的spark_conf字段中设置Spark属性.

When you configure a cluster using the Clusters API, set Spark properties in the spark_conf field in the Create cluster request or Edit cluster request.

要为所有集群设置Spark属性,请创建一个全局初始化脚本:

To set Spark properties for all clusters, create a global init script:

%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
  |#!/bin/bash
  |
  |cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
  |[driver] {
  |  "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
  |}
  |EOF
  """.stripMargin, true)

参考:数据块-Spark配置

示例:您可以选择要测试的任何Spark配置,在这里我要指定" spark.executor.memory 4g" ,并且自定义配置看起来像这个.

Example: You can pick any spark configuration you want to test, here I want to specify "spark.executor.memory 4g",and the custom configuration looks like this.

创建集群后,您可以查看自定义配置的结果.

After the cluster created, you can check out the result of custom configuration.

希望这会有所帮助.

Hope this helps.

这篇关于Azure Databricks:如何在Databricks群集中添加Spark配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆