使用Azure Databricks REST Jobs API进行I/O操作 [英] I/O operations with Azure Databricks REST Jobs API

查看:124
本文介绍了使用Azure Databricks REST Jobs API进行I/O操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过以下方式使用REST Jobs API执行Azure Databricks笔记本的内容:

I would like to execute the content of Azure Databricks notebook with use of REST Jobs API in the following manner:

  1. 将一组key:value参数传递给笔记本的PySpark上下文
  2. 执行一些由参数告知的Python计算

对于第1点,我使用以下内容(如文档

For point 1 I use the following (as suggested by the documentation here):

curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path"}, "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit

要解决第2点,我尝试了以下方法,但均未成功:

To address point 2 I tried the following approaches without success:

2.1.方法1 :input = spark.conf.get("base_parameters", "default")

2.2.方法2 :input = spark.sparkContext.getConf().getAll()

2.3.方法3 :

a = dbutils.widgets.getArgument("input_multiple_polygons", "default")

b = dbutils.widgets.getArgument("input_date_start", "default")

c = dbutils.widgets.getArgument("input_date_end", "default")

input = [a,b,c]

2.4.方法4 (根据官方文档此处):

a = dbutils.widgets.get("input_multiple_polygons")

b = dbutils.widgets.get("input_date_start")

c = dbutils.widgets.get("input_date_end")

input = [a,b,c]

REST Jobs端点工作正常并且执行成功,但是,概述的四种方法似乎都无法将参数传递给PySpark上下文.

The REST Jobs endpoints are working fine and the execution is successful, however, none of the outlined four approaches seems to be able to deliver the arguments to the PySpark Context.

我确定我在curl部分或args检索部分做错了什么,但我无法确定问题所在.谁能建议这个问题可能在哪里?

I am sure I do something incorrect in either the curl part or the args retrieval part but I can't identify the problem. Can anyone suggest where the issue may be?

推荐答案

好像您没有将base_parameter包含在notebook_task中.您可以尝试以下类似的方法吗?我假设您正在为base_parameters传递正确的值,因为共享的示例显示参数值与参数名称相同.

Looks like you are not enclosing the base_parameter as an element within notebook_task. Can you try something like below? I assume you are passing right values for base_parameters since the example shared shows parameter values are given same as parameter name.

curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path", "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit

确定外观的简便方法是使用UI定义作业,然后使用 api/2.0/jobs/get?job_id=<jobId> 查看JSON响应.

Easy way to identify how it looks like is to define a job using UI and use api/2.0/jobs/get?job_id=<jobId> to see the JSON response.

这篇关于使用Azure Databricks REST Jobs API进行I/O操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆