Azure数据工厂通过访问密钥连接到Blob存储 [英] Azure Data Factory connecting to Blob Storage via Access Key

查看:160
本文介绍了Azure数据工厂通过访问密钥连接到Blob存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Azure数据工厂中构建一个非常基本的数据流,从blob存储中提取一个JSON文件,对某些列执行转换,然后存储在SQL数据库中.我最初使用托管身份对存储帐户进行了身份验证,但是在尝试测试与源的连接时出现以下错误:

I'm trying to build a very basic data flow in Azure Data Factory pulling a JSON file from blob storage, performing a transformation on some columns, and storing in a SQL database. I originally authenticated to the storage account using Managed Identity, but I get the error below when attempting to test the connection to the source:

com.microsoft.dataflow.broker.MissingRequiredPropertyException: 帐户是[myStorageAccountName]的必需属性. com.microsoft.dataflow.broker.PropertyNotFoundException:无法 从[myStorageAccountName]-RunId:xxx

com.microsoft.dataflow.broker.MissingRequiredPropertyException: account is a required property for [myStorageAccountName]. com.microsoft.dataflow.broker.PropertyNotFoundException: Could not extract value from [myStorageAccountName] - RunId: xxx

我还在工厂验证"输出中看到以下消息:

I also see the following message in the Factory Validation Output:

[MyDataSetName] AzureBlobStorage不支持SAS, MSI,或数据流中的服务主体身份验证.

[MyDataSetName] AzureBlobStorage does not support SAS, MSI, or Service principal authentication in data flow.

基于此,我假设我要做的就是将我的Blob存储链接服务切换为帐户密钥身份验证方法.但是,当我切换到帐户密钥"身份验证并选择我的订阅和存储帐户后,在测试连接时出现以下错误:

With this I assumed that all I would need to do is switch my Blob Storage Linked Service to an Account Key authentication method. After I switched to Account Key authentication though and select my subscription and storage account, when testing the connection I get the following error:

连接失败无法连接到 https://[myBlob] .blob.core.windows.net/:错误讯息: 远程服务器返回错误:(403)禁止. (错误代码:403, 详细信息:该请求无权执行此操作. RequestId:xxxx),请确保 提供的凭证有效.远程服务器返回错误: (403)Forbidden.StorageExtendedMessage =,远程服务器返回了 错误:(403)禁止.活动编号: xxx.

Connection failed Fail to connect to https://[myBlob].blob.core.windows.net/: Error Message: The remote server returned an error: (403) Forbidden. (ErrorCode: 403, Detail: This request is not authorized to perform this operation., RequestId: xxxx), make sure the credential provided is valid. The remote server returned an error: (403) Forbidden.StorageExtendedMessage=, The remote server returned an error: (403) Forbidden. Activity ID: xxx.

我尝试直接从Azure中进行选择,然后手动输入密钥,但两种方法都遇到相同的错误.需要注意的一件事是存储帐户仅允许访问指定的网络.我尝试连接到其他公共存储帐户,并且能够正常访问. ADF帐户具有存储帐户贡献者"角色,并且我添加了当前工作所在的IP地址以及在此处找到的Azure数据工厂的IP范围:

I've tried selecting from Azure directly and also entering the key manually and get the same error either way. One thing to note is the storage account only allows access to specified networks. I tried connecting to a different, public storage account and am able to access fine. The ADF account has the Storage Account Contributor role and I've added the IP address of where I am working currently as well as the IP range of Azure Data Factory that I found here: https://docs.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses

还请注意,我目前大约有5个复制数据任务与Managed Identity完美配合,但是我需要开始执行更复杂的操作.

Also note, I have about 5 copy data tasks working perfectly fine with Managed Identity currently, but I need to start doing more complex operations.

这似乎与类似在Azure数据工厂中创建链接服务,但是我分配的存储帐户贡献者和所有者角色应取代答复中建议的读者角色.我也不确定发布者是使用公共存储帐户还是私人存储帐户.

This seems like a similar issue as Unable to create a linked service in Azure Data Factory but the Storage Account Contributor and Owner roles I have assigned should supersede the Reader role as suggested in the reply. I'm also not sure if the poster is using a public storage account or private.

谢谢.

推荐答案

At the very bottom of the article listed above about white listing IP ranges of the integration runtime, Microsoft says the following:

连接到Azure存储帐户时,IP网络规则没有 对源自Azure集成运行时中的请求的影响 与存储帐户相同的区域.有关更多详细信息,请参阅

When connecting to Azure Storage account, IP network rules have no effect on requests originating from the Azure integration runtime in the same region as the storage account. For more details, please refer this article.

我曾就此与Microsoft支持部门联系过,问题是白名单公用IP地址不适用于同一区域内的资源,因为由于资源位于同一网络上,因此它们使用私有IP而不是公共IP相互连接.

I spoke to Microsoft support about this and the issue is that white listing public IP addresses does not work for resources within the same region because since the resources are on the same network, they connect to each other using private IP's rather than public.

有四个选项可以解决原始问题:

There are four options to resolve the original issue:

  • Allow access from all networks under Firewalls and Virtual Networks in the storage account (obviously this is a concern if you are storing sensitive data). I tested this and it works.
  • Create a new Azure hosted integration runtime that runs in a different region. I tested this as well. My ADF data flow is running in East region and I created a runtime that runs in East 2 and it worked immediately. The issue for me here is I would have to have this reviewed by security before pushing to prod because we'd be sending data across the public network, even though it's encrypted, etc, it's still not as secure as having two resources talking to each other in the same network.
  • Use a separate activity such as an HDInsight activity like Spark or an SSIS package. I'm sure this would work, but the issue with SSIS is cost as we would have to spin up an SSIS DB and then pay for the compute. You also need to execute multiple activities in the pipeline to start and stop the SSIS pipeline before and after execution. Also I don't feel like learning Spark just for this.
  • Finally, the solution that works that I used is I created a new connection that replaced the Blob Storage with a Data Lakes Gen 2 connection for the data set. It worked like a charm. Unlike Blob Storage connection, Managed Identity is supported for Azure Data Lakes Storage Gen 2 as per this article. In general, the more specific the connection type, the more likely the features will work for the specific need.

这篇关于Azure数据工厂通过访问密钥连接到Blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆