Azure数据工厂-限制同时运行的Databricks管道的数量 [英] Azure Data Factory - Limit the number of Databricks pipeline running at the same time

查看:85
本文介绍了Azure数据工厂-限制同时运行的Databricks管道的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ADF执行Databricks笔记本.目前,我有6个管道,因此它们将被执行.

I am using ADF to execute Databricks notebook. At this time, I have 6 pipelines, and they are executed consequently.

具体来说,前者完成后,后者由循环框使用多个参数执行,并且这种情况一直存在.例如,在完成第一个管道之后,它将使用不同的参数触发第二个管道的3个实例,并且这些实例中的每个实例都会触发第三个管道的多个实例.结果,我越深入,必须运行的管道就越多.

Specifically, after the former is done, the latter is executed with multiple parameters by the loop box, and this keeps going. For example, after the first pipeline is done, it will trigger 3 instances of the second pipeline with different parameters, and each of these instances will trigger multiple instances of the third pipeline. As a result, the deeper I go, the more pipelines I have to run.

我遇到的问题是:执行每个管道时,它将要求Databricks分配要运行的集群.但是,Databricks限制了每个工作空间使用的内核数量,这导致管道实例无法运行.

The issue with me is: when each pipeline is executed, it will ask Databricks to allocate a cluster to run. However, Databricks limits the number of cores to be used for each workspace, which causes the pipeline instance to fail to run.

我的问题是:是否有解决方案来控制同时运行的管道实例数量,或者有解决方案来解决我的问题?

My question is: is there any solution to control the number of pipeline instance running at the same time, or any solution to handle my issue?

预先感谢:-)

推荐答案

为什么会出现此问题?

Why this issue occurs?

注意:创建Databricks群集始终具有订阅中可用核心数量的依赖项.

Note: Creating a Databricks clusters always have a dependency with the number of cores available in the subscription.

在创建任何数据块集群之前,请确保核心数为 在所选区域和VM系列vCPU中可用.

Before creating any databricks cluster, make sure number of cores are available in the region selected and the VM Family vCPUs.

您可以转到 Azure Portal => 订阅 => 选择您的订阅 =>设置"来检出订阅的核心限制使用情况+报价" =>查看每个地区可用的使用情况配额.

You can checkout the core limit of your subscription by going to Azure Portal => Subscriptions => Select your subscription => Settings "Usage + quotes" => Checkout the usage quota available for each regions.

示例::如果您的订阅具有> 72个内核,导致ADF运行成功,否则导致失败.

Example: If your subscription has > 72 cores which results in success of ADF runs else results in failure.

Activity Validate failed: Databricks execution failed with error message: Unexpected failure while waiting for the cluster to be ready. Cause Unexpected state for cluster (job-200-run-1):  Could not launch cluster due to cloud provider failures. azure_error_code: OperationNotAllowed, azure_error_message: Operation results in exceeding quota limits of Core. Maximum allowed: 350, Current in use: 344

我正在尝试创建6个带有数据块集群的管道,每个集群有2个工作节点.这意味着需要

I’m trying to create 6 pipelines with databricks clusters with 2 worker nodes each. Which means it requires

(6个管道)*(1个驱动程序节点+ 2个工作节点)*(4个内核)= 72个内核.

具有4个内核的VM Size Standard_DS3_v2一起使用的计算.

注意:要创建一个数据砖火花集群,该集群需要4个以上的核,(对于Driver类型,最少4个核;对于4个核,至少4个核). 工人类型).

Note: To create a databricks spark cluster which requires more than 4 cores i.e. (Minimum 4 cores for Driver type and 4 cores for Worker type).

此问题的解决方案:

  1. 通过将计费和订阅团队的票证提高到更高的限额来增加核心限额.使用此选项后,一旦使用,您将只为使用过的内核付费.
  2. 限制工作频率,以限制群集数量/考虑使用单个作业来复制多个文件,以便限制群集的创建,这将耗尽您的订阅核心.

要请求增加一种或多种支持这种增加的资源,请提交 Azure支持请求(对于问题类型,请选择配额").

To request an increase of one or more resources that support such an increase, submit an Azure support request (select "Quota" for Issue type).

问题类型:服务和订阅限制(配额)

Issue type: Service and subscription limits (quotas)

参考:总区域vCPU限制增加

希望这会有所帮助.如果您还有其他疑问,请告诉我们.

Hope this helps. Do let us know if you any further queries.

请点击有助于您的帖子上的标记为答案"和赞",这对其他社区成员可能会有所帮助.

Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

这篇关于Azure数据工厂-限制同时运行的Databricks管道的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆